Merge pull request '提交源码' (#14) from develop into master

pull/26/head
p8hw6pnsf 3 years ago
commit 64b68db666

@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

@ -0,0 +1,338 @@
# Image Augmentation
---
Experiments about data augmentation will be introduced in detail in this section. If you want to quickly experience these methods, please refer to [**Quick start PaddleClas in 30 miniutes**](../../tutorials/quick_start_en.md), which based on CIFAR100 dataset. If you want to know the content of related algorithms, please refer to [Data Augmentation Algorithm Introduction](../algorithm_introduction/DataAugmentation_en.md).
## Catalogue
- [1. Configurations](#1)
- [1.1 AutoAugment](#1.1)
- [1.2 RandAugment](#1.2)
- [1.3 TimmAutoAugment](#1.3)
- [1.4 Cutout](#1.4)
- [1.5 RandomErasing](#1.5)
- [1.6 HideAndSeek](#1.6)
- [1.7 GridMask](#1.7)
- [1.8 Mixup](#1.8)
- [1.9 Cutmix](#1.9)
- [1.10 Use Mixup and Cutmix at the same time](#1.10)
- [2. Start training](#2)
- [3. Matters needing attention](#3)
- [4. Experiments](#4)
<a name="1"></a>
## Configurations
Since hyperparameters differ from different augmentation methods. For better understanding, we list 8 augmentation configuration files in `configs/DataAugment` based on ResNet50. Users can train the model with `tools/run.sh`. The following are 3 of them.
<a name="1.1"></a>
### 1.1 AutoAugment
The configuration of the data augmentation method of `AotoAugment` is as follows. `AutoAugment` is converted on the uint8 data format, so its processing should be placed before the normalization operation (`NormalizeImage`).
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- AutoAugment:
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
```
<a name="1.2"></a>
### 1.2 RandAugment
The configuration of the data augmentation method of `RandAugment` is as follows, where the user needs to specify the parameters `num_layers` and `magnitude`, and the default values are `2` and `5` respectively. `RandAugment` is converted on the uint8 data format, so its processing should be placed before the normalization operation (`NormalizeImage`).
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- RandAugment:
num_layers: 2
magnitude: 5
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
```
<a name="1.3"></a>
### 1.3 TimmAutoAugment
The configuration of the data augmentation method of `TimmAutoAugment` is as follows, in which the user needs to specify the parameters `config_str`, `interpolation`, and `img_size`. The default values are `rand-m9-mstd0.5-inc1` and `bicubic. `, `224`. `TimmAutoAugment` is converted on the uint8 data format, so its processing should be placed before the normalization operation (`NormalizeImage`).
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- TimmAutoAugment:
config_str: rand-m9-mstd0.5-inc1
interpolation: bicubic
img_size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
```
<a name="1.4"></a>
### 1.4 Cutout
The configuration of the data augmentation method of `Cutout` is as follows, where the user needs to specify the parameters `n_holes` and `length`, and the default values are `1` and `112` respectively. Similar to other image cropping data augmentation methods, `Cutout` can operate on data in uint8 format, or on data after normalization (`NormalizeImage`).The demo here is operated after normalization.
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- Cutout:
n_holes: 1
length: 112
```
<a name="1.5"></a>
### 1.5 RandomErasing
The configuration of the image augmentation method of `RandomErasing` is as follows, where the user needs to specify the parameters `EPSILON`, `sl`, `sh`, `r1`, `attempt`, `use_log_aspect`, `mode`, and the default values They are `0.25`, `0.02`, `1.0/3.0`, `0.3`, `10`, `True`, and `pixel`. Similar to other image cropping data augmentation methods, `RandomErasing` can operate on data in uint8 format, or on data after normalization (`NormalizeImage`).The demo here is operated after normalization.
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- RandomErasing:
EPSILON: 0.25
sl: 0.02
sh: 1.0/3.0
r1: 0.3
attempt: 10
use_log_aspect: True
mode: pixel
```
<a name="1.6"></a>
### 1.6 HideAndSeek
The configuration of the image augmentation method of `HideAndSeek` is as follows. Similar to other image cropping data augmentation methods, `HideAndSeek` can operate on data in uint8 format, or on data after normalization (`NormalizeImage`).The demo here is operated after normalization.
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- HideAndSeek:
```
<a name="1.7"></a>
### 1.7 GridMask
The configuration of the image augmentation method of `GridMask` is as follows, where the user needs to specify the parameters `d1`, `d2`, `rotate`, `ratio`, `mode`, and the default values are `96`, `224 respectively `, `1`, `0.5`, `0`. Similar to other image cropping data augmentation methods, `HideAndSeek` can operate on data in uint8 format, or on data after normalization (`GridMask`).The demo here is operated after normalization.
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- GridMask:
d1: 96
d2: 224
rotate: 1
ratio: 0.5
mode: 0
```
<a name="1.8"></a>
### 1.8 Mixup
The configuration of the data augmentation method of `Mixup` is as follows, where the user needs to specify the parameter `alpha`, and the default value is `0.2`. Similar to other image mixing data augmentation methods, `Mixup` is to perform image mix on the data in each batch after the image is processed, and the mixed images and labels are put into the network for training,
so it operates after image data processing (image transformation, image cropping).
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
batch_transform_ops:
- MixupOperator:
alpha: 0.2
```
<a name="1.9"></a>
### 1.9 Cutmix
The configuration of the image augmentation method of `Cutmix` is as follows, where the user needs to specify the parameter `alpha`, and the default value is `0.2`. Similar to other image mixing data augmentation methods, `Mixup` is to perform image mix on the data in each batch after the image is processed, and the mixed images and labels are put into the network for training,
so it operates after image data processing (image transformation, image cropping).
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
batch_transform_ops:
- CutmixOperator:
alpha: 0.2
```
<a name="1.10"></a>
### 1.10 Use Mixup and Cutmix at the same time
The configuration for both `Mixup` and `Cutmix` is as follows, in which the user needs to specify an additional parameter `prob`, which controls the probability of different data enhancements, and the default is `0.5`.
```yaml
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- OpSampler:
MixupOperator:
alpha: 0.8
prob: 0.5
CutmixOperator:
alpha: 1.0
prob: 0.5
```
<a name="2"></a>
## 2. Start training
After you configure the training environment, similar to training other classification tasks, you only need to replace the configuration file in `tools/train.sh` with the configuration file of the corresponding data augmentation method.
The contents of `train.sh` are as follows:
```bash
python3 -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
--log_dir=ResNet50_Cutout \
tools/train.py \
-c ./ppcls/configs/ImageNet/DataAugment/ResNet50_Cutout.yaml
```
Run `train.sh`:
```bash
sh tools/train.sh
```
<a name="3"></a>
## 3. Matters needing attention
* In addition, because the label needs to be aliased when the image is aliased, the accuracy of the training data cannot be calculated. The training accuracy rate was not printed during the training process.
* The training data is more difficult with data augmentation, so the training loss may be larger, the training set accuracy is relatively low, but it has better generalization ability, so the validation set accuracy is relatively higher.
* After the use of data augmentation, the model may tend to be underfitting. It is recommended to reduce `l2_decay` for better performance on validation set.
* hyperparameters exist in almost all agmenatation methods. Here we provide hyperparameters for ImageNet1k dataset. User may need to finetune the hyperparameters on specified dataset. More training tricks can be referred to [Tricks](../models_training/train_strategy_en.md).
> If this document is helpful to you, welcome to star our project: [https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
<a name="4"></a>
## 4. Experiments
Based on PaddleClas, Metrics of different augmentation methods on ImageNet1k dataset are as follows.
| Model | Learning strategy | l2 decay | batch size | epoch | Augmentation method | Top1 Acc | Reference |
|-------------|------------------|--------------|------------|-------|----------------|------------|----|
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | Standard transform | 0.7731 | - |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | AutoAugment | 0.7795 | 0.7763 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | mixup | 0.7828 | 0.7790 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | cutmix | 0.7839 | 0.7860 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | cutout | 0.7801 | - |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | gridmask | 0.7785 | 0.7790 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | random-augment | 0.7770 | 0.7760 |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | random erasing | 0.7791 | - |
| ResNet50 | 0.1/cosine_decay | 0.0001 | 256 | 300 | hide and seek | 0.7743 | 0.7720 |
**note**:
* In the experiment here, for better comparison, we fixed the l2 decay to 1e-4. To achieve higher accuracy, we recommend trying to use a smaller l2 decay. Combined with data augmentaton, we found that reducing l2 decay from 1e-4 to 7e-5 can bring at least 0.3~0.5% accuracy improvement.
* We have not yet combined different strategies or verified, whch is our future work.

@ -0,0 +1,303 @@
# Code Overview
## Catalogue
- [1. Overview of Code and Content](#1)
- [2. Training Module](#2)
- [2.1 Data](#2.1)
- [2.2 Model Structure](#2.2)
- [2.3 Loss Function](#2.3)
- [2.4 Optimizer, Learning Rate Decay, and Weight Decay](#2.4)
- [2.5 Evaluation During Training](#2.5)
- [2.6 Model Saving](#2.6)
- [2.7 Model Pruning and Quantification](#2.7)
- [3. Codes and Methods for Inference and Deployment](#3)
<a name="1"></a>
## 1. Overview of Code and Content
The main code and content structure of PaddleClas are as follows:
- benchmark: shell scripts to test the speed metrics of different models in PaddleClas, such as single-card training speed metrics, multi-card training speed metrics, etc.
- dataset: datasets and the scripts used to process datasets. The scripts are responsible for processing the dataset into a suitable format for Dataloader.
- deploy: code for deployment, including deployment tools, which support python/cpp inference, Hub Serveing, Paddle Lite, Slim offline quantification and other deployment methods.
- ppcls: code for training and evaluation which is the main body of the PaddleClas framework. It also contains configuration files, and specific code of model training, evaluation, inference, dynamic to static export, etc.
- tools: entry functions and scripts for training, evaluation, inference, and dynamic to static export.
- The requirements.txt file is adopted to install the dependencies for PaddleClas. Use pip for upgrading, installation, and application.
- test_tipc: TIPC tests of PaddleClas models from training to prediction to verify that whether each function works properly.
<a name="2"></a>
## 2. Training Module
Modules of training deep learning model mainly contains data, model structure, loss function,
strategies such as optimizer, learning rate decay, and weight decay strategy, etc., which are explained below.
<a name="2.1"></a>
### 2.1 Data
For supervised tasks, the training data generally contains the raw data and its annotation.
In a single-label-based image classification task, the raw data refers to the image data,
while the annotation is the class to which the image data belongs. In PaddleClas, a label file,
in the following format, is required for training,
with each row containing one training sample and separated by a separator (space by default),
representing the image path and the class label respectively.
```
train/n01440764/n01440764_10026.JPEG 0
train/n01440764/n01440764_10027.JPEG 0
```
`ppcls/data/dataloader/common_dataset.py` contains the `CommonDataset` class inherited from `paddle.io.Dataset`,
which is a dataset class that can index and fetch a given sample by a key value.
Dataset classes such as `ImageNetDataset`, `LogoDataset`, `CommonDataset`, etc. are all inherited from this class.
The raw image needs to be preprocessed before training.
The standard data preprocessing during training contains
`DecodeImage`, `RandCropImage`, `RandFlipImage`, `NormalizeImage`, and `ToCHWImage`.
The data preprocessing is mainly in the `transforms` field, which is presented in a list,
and then converts the data in order, as reflected in the configuration file below.
```yaml
DataLoader:
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/train_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- RandCropImage:
size: 224
- RandFlipImage:
flip_code: 1
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
```
PaddleClas also contains `AutoAugment`, `RandAugment`, and other data augmentation methods,
which can also be configured in the configuration file and thus added to the data preprocessing of the training.
Each data augmentation and process method is implemented as a class for easy migration and reuse.
For more specific implementation of data processing, please refer to the code under `ppcls/data/preprocess/ops/`.
You can also use methods such as mixup or cutmix to augment the data that make up a batch.
PaddleClas integrates `MixupOperator`, `CutmixOperator`, `FmixOperator`, and other batch-based data augmentation methods,
which can be configured by deploying the mix parameter in the configuration file.
For code implementation, please refer to `ppcls/data/preprocess /batch_ops/batch_operators.py`.
In image classification, the data post-processing is mainly `argmax` operation, which is not elaborated here.
<a name="2.2"></a>
### 2.2 Model Structure
The model in the configuration file is structured as follows:
```yaml
Arch:
name: ResNet50
class_num: 1000
pretrained: False
use_ssld: False
```
`Arch.name`: the name of the model
`Arch.pretrained`: whether to add a pre-trained model
`Arch.use_ssld`: whether to use a pre-trained model based on `SSLD` knowledge distillation.
All model names are defined in `ppcls/arch/backbone/__init__.py`.
Correspondingly, the model object is created in `ppcls/arch/__init__.py` with the `build_model` method.
```python
def build_model(config):
config = copy.deepcopy(config)
model_type = config.pop("name")
mod = importlib.import_module(__name__)
arch = getattr(mod, model_type)(**config)
return arch
```
<a name="2.3"></a>
### 2.3 Loss Function
PaddleClas implement `CELoss` , `JSDivLoss`, `TripletLoss`, `CenterLoss` and other loss functions, all defined in `ppcls/loss`.
In the `ppcls/loss/__init__.py` file, `CombinedLoss` is used to construct and combine loss functions.
The loss functions and calculation methods required in different training strategies are disparate,
and the following factors are considered by PaddleClas in the construction of the loss function.
1. whether to use label smooth
2. whether to use mixup or cutmix
3. whether to use distillation method for training
4. whether to train metric learning
User can specify the type and weight of the loss function in the configuration file,
such as adding TripletLossV2 to the training, the configuration file is as follows:
```yaml
Loss:
Train:
- CELoss:
weight: 1.0
- TripletLossV2:
weight: 1.0
margin: 0.5
```
<a name="2.4"></a>
### 2.4 Optimizer, Learning Rate Decay, and Weight Decay
In image classification tasks, `Momentum` is a commonly used optimizer,
and several optimizer strategies such as `Momentum`, `RMSProp`, `Adam`, and `AdamW` are provided in PaddleClas.
The weight decay strategy is a common regularization method, mainly adopted to prevent model overfitting.
Two weight decay strategies, `L1Decay` and `L2Decay`, are provided in PaddleClas.
Learning rate decay is an essential training method for accuracy improvement in image classification tasks.
PaddleClas currently supports `Cosine`, `Piecewise`, `Linear`, and other learning rate decay strategies.
In the configuration file, the optimizer, weight decay,
and learning rate decay strategies can be configured with the following fields.
```yaml
Optimizer:
name: Momentum
momentum: 0.9
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
regularizer:
name: 'L2'
coeff: 0.0001
```
Employ `build_optimizer` in `ppcls/optimizer/__init__.py` to create the optimizer and learning rate objects.
```python
def build_optimizer(config, epochs, step_each_epoch, parameters):
config = copy.deepcopy(config)
# step1 build lr
lr = build_lr_scheduler(config.pop('lr'), epochs, step_each_epoch)
logger.debug("build lr ({}) success..".format(lr))
# step2 build regularization
if 'regularizer' in config and config['regularizer'] is not None:
reg_config = config.pop('regularizer')
reg_name = reg_config.pop('name') + 'Decay'
reg = getattr(paddle.regularizer, reg_name)(**reg_config)
else:
reg = None
logger.debug("build regularizer ({}) success..".format(reg))
# step3 build optimizer
optim_name = config.pop('name')
if 'clip_norm' in config:
clip_norm = config.pop('clip_norm')
grad_clip = paddle.nn.ClipGradByNorm(clip_norm=clip_norm)
else:
grad_clip = None
optim = getattr(optimizer, optim_name)(learning_rate=lr,
weight_decay=reg,
grad_clip=grad_clip,
**config)(parameters=parameters)
logger.debug("build optimizer ({}) success..".format(optim))
return optim, lr
```
Different optimizers and weight decay strategies are implemented as classes,
which can be found in the file `ppcls/optimizer/optimizer.py`.
Different learning rate decay strategies can be found in the file `ppcls/optimizer/learning_rate.py`.
<a name="2.5"></a>
### 2.5 Evaluation During Training
When training the model, you can set the interval of model saving,
or you can evaluate the validation set every several epochs so that the model with the best accuracy can be saved.
Follow the examples below to configure.
```
Global:
save_interval: 1 # epoch interval of model saving
eval_during_train: True # whether evaluate during training
eval_interval: 1 # epoch interval of evaluation
```
<a name="2.6"></a>
### 2.6 Model Saving
The model is saved through the `paddle.save()` function of the Paddle framework.
The dynamic graph version of the model is saved in the form of a dictionary to facilitate further training.
The specific implementation is as follows:
```python
def save_model(program, model_path, epoch_id, prefix='ppcls'):
model_path = os.path.join(model_path, str(epoch_id))
_mkdir_if_not_exist(model_path)
model_prefix = os.path.join(model_path, prefix)
paddle.static.save(program, model_prefix)
logger.info(
logger.coloring("Already save model in {}".format(model_path), "HEADER"))
```
When saving, there are two things to keep in mind:
1. Only save the model on node 0, otherwise, if all nodes save models to the same path,
a file conflict may occur during multi-card training when multiple nodes write files,
preventing the final saved model from being loaded correctly.
2. Optimizer parameters also need to be saved to facilitate subsequent loading of breakpoints for training.
<a name="2.7"></a>
### 2.7 Model Pruning and Quantification
If you want to conduct compression training, please configure with the following fields.
1. Model pruning
```yaml
Slim:
prune:
name: fpgm
pruned_ratio: 0.3
```
2. Model quantification
```yaml
Slim:
quant:
name: pact
```
For details of the training method, see [Pruning and Quantification Application](model_prune_quantization_en.md),
and the algorithm is described in [Pruning and Quantification algorithms](model_prune_quantization_en.md).
<a name="3"></a>
## 3. Codes and Methods for Inference and Deployment
- If you wish to quantify the classification model offline, please refer to
[Model Pruning and Quantification Tutorial](model_prune_quantization_en.md) for offline quantification.
- If you wish to use python for server-side deployment,
please refer to [Python Inference Tutorial](../inference_deployment/python_deploy_en.md).
- If you wish to use cpp for server-side deployment,
please refer to [Cpp Inference Tutorial](../inference_deployment/cpp_deploy_en.md).
- If you wish to deploy the classification model as a service,
please refer to the [Hub Serving Inference Deployment Tutorial](../inference_deployment/paddle_hub_serving_deploy_en.md).
- If you wish to use classification models for inference on mobile,
please refer to [PaddleLite Inference Deployment Tutorial](../inference_deployment/paddle_lite_deploy_en.md)
- If you wish to use the whl package for inference of classification models,
please refer to [whl Package Inference](../inference_deployment/whl_deploy_en.md) .

@ -0,0 +1,245 @@
# Knowledge Distillation
## Introduction of model compression methods
In recent years, deep neural networks have been proven to be an extremely effective method to solve problems in the fields of computer vision and natural language processing. The deep learning methods performs better than traditional methods with suitable network structure and training process.
With enough training data, increasing parameters of the neural network by building a reasonabe network can significantly the model performance. But this increases the model complexity, which takes too much computation cost in real scenarios.
Parameter redundancy exists in deep neural networks. There are several methods to compress the model such as pruning ,quantization, knowledge distillation, etc. Knowledge distillation refers to using the teacher model to guide the student model to learn specific tasks, ensuring that the small model has a relatively large effect improvement with the computation cost unchanged, and even obtains similar accuracy with the large model [1]. Combining some of the existing distillation methods [2,3], PaddleClas provides a simple semi-supervised label knowledge distillation solution (SSLD). Top-1 Accuarcy on ImageNet1k dataset has an improvement of more than 3% based on ResNet_vd and MobileNet series, which can be shown as below.
![](../../../images/distillation/distillation_perform_s.jpg)
## SSLD
### Introduction
The following figure shows the framework of SSLD.
![](../../../images/distillation/ppcls_distillation.png)
First, we select nearly 4 million images from ImageNet22k dataset, and integrate it with the ImageNet-1k training set to get a new dataset containing 5 million images. Then, we combine the student model and the teacher model into a new network, which outputs the predictions of the student model and the teacher model, respectively. The gradient of the entire network of the teacher model is fixed. Finally, we use JS divergence loss as the loss function for the training process. Here we take MobileNetV3 distillation task as an example, and introduce key points of SSLD.
* Choice of the teacher model, During knowledge distillation, it may not be an optimal solution if the structure of the teacher model and the student model are too different. Under the same structure, the teacher model with higher accuracy leads to better performance for the student model during distillation. Compared with the 79.12% ResNet50_vd teacher model, using the 82.4% teacher model can bring a 0.4% accuracy improvement on Top-1 accuracy (`75.6%-> 76.0%`).
* Improvement of loss function. The most commonly used loss function for classification is cross entropy loss. We find that when using soft label for training, KL divergence loss is almost useless to improve model performance compared to cross entropy loss, but The accuracy has a 0.2% improvement using JS divergence loss (`76.0%-> 76.2%`). Loss function in SSLD is JS divergence loss.
* More iteration number. It is only 120 for the baseline experiment. We can achieve a 0.9% improvement by setting it as 360 (`76.2%-> 77.1%`).
* There is not need for laleled data in SSLD, which leads to convenient training data expansion. label is not utilized when computing the loss function, therefore the unlabeled data can also be used to train the network. The label-free distillation strategy of this distillation solution has also greatly improved the upper performance limit of student models (`77.1%-> 78.5%`).
* ImageNet1k finetune. ImageNet1k training set is used for finetuning, which brings a 0.4% accuracy improvement (`78.5%-> 78.9%`).
### Data selection
* An important feature of the SSLD distillation scheme is no need for labeled images, so the dataset size can be arbitrarily expanded. Considering the limitation of computing resources, we here only expand the training set of the distillation task based on the ImageNet22k dataset. For SSLD, we used the `Top-k per class` data sampling scheme [3]. Specific steps are as follows.
     * Deduplication of training set. We first deduplicate the ImageNet22k dataset and the ImageNet1k validation set based on the SIFT feature similarity matching method to prevent the added ImageNet22k training set from containing the ImageNet1k validation set images. Finally we removed 4511 similar images. Similar pictures with partial filtering are shown below.
![](../../../images/distillation/22k_1k_val_compare_w_sift.png)
* Obtain the soft label of the ImageNet22k dataset. For the ImageNet22k dataset after deduplication, we use the `ResNeXt101_32x16d_wsl` model to make predictions to obtain the soft label of each image.
     * Top-k data selection. There contains 1000 categories in ImageNet1k dataset. For each category, we find out images in the category with Top-k highest score, and finally generate a dataset whose image number does not exceed `1000 * k` (For some categories, there may contain less than k images).
     * The selected images are merged with the ImageNet1k training set to form the new dataset used for the final distillation model training, which contains 5 million images in all.
## Experiments
The distillation solution that PaddleClas provides is combining common training with finetuning. Given a suitable teacher model, the large dataset(5 million) is used for common training and the ImageNet1k dataset is used for finetuning.
### Choice of teacher model
In order to verify the influence of the model size difference between the teacher model and the student model on the distillation results as well as the teacher model accuracy, we conducted several experiments. The training strategy is unified as follows: `cosine_decay_warmup, lr = 1.3, epoch = 120, bs = 2048`, and the student models are all trained from scratch.
|Teacher Model | Teacher Top1 | Student Model | Student Top1|
|- |:-: |:-: | :-: |
| ResNeXt101_32x16d_wsl | 84.2% | MobileNetV3_large_x1_0 | 75.78% |
| ResNet50_vd | 79.12% | MobileNetV3_large_x1_0 | 75.60% |
| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 76.00% |
It can be shown from the table that:
> When the teacher model structure is the same, the higher the teacher model accuracy, the better the final student model will be.
>
> The size difference between the teacher model and the student model should not be too large, otherwise it will decrease the accuracy of the distillation results.
Therefore, during distillation, for the ResNet series student model, we use `ResNeXt101_32x16d_wsl` as the teacher model; for the MobileNet series student model, we use` ResNet50_vd_SSLD` as the teacher model.
### Distillation using large-scale dataset
Training process is carried out on the large-scale dataset with 5 million images. Specifically, the following table shows more details of different models.
|Student Model | num_epoch | l2_ecay | batch size/gpu cards | base lr | learning rate decay | top1 acc |
| - |:-: |:-: | :-: |:-: |:-: |:-: |
| MobileNetV1 | 360 | 3e-5 | 4096/8 | 1.6 | cosine_decay_warmup | 77.65% |
| MobileNetV2 | 360 | 1e-5 | 3072/8 | 0.54 | cosine_decay_warmup | 76.34% |
| MobileNetV3_large_x1_0 | 360 | 1e-5 | 5760/24 | 3.65625 | cosine_decay_warmup | 78.54% |
| MobileNetV3_small_x1_0 | 360 | 1e-5 | 5760/24 | 3.65625 | cosine_decay_warmup | 70.11% |
| ResNet50_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 82.07% |
| ResNet101_vd | 360 | 7e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 83.41% |
| Res2Net200_vd_26w_4s | 360 | 4e-5 | 1024/32 | 0.4 | cosine_decay_warmup | 84.82% |
### finetuning using ImageNet1k
Finetuning is carried out on ImageNet1k dataset to restore distribution between training set and test set. the following table shows more details of finetuning.
|Student Model | num_epoch | l2_ecay | batch size/gpu cards | base lr | learning rate decay | top1 acc |
| - |:-: |:-: | :-: |:-: |:-: |:-: |
| MobileNetV1 | 30 | 3e-5 | 4096/8 | 0.016 | cosine_decay_warmup | 77.89% |
| MobileNetV2 | 30 | 1e-5 | 3072/8 | 0.0054 | cosine_decay_warmup | 76.73% |
| MobileNetV3_large_x1_0 | 30 | 1e-5 | 2048/8 | 0.008 | cosine_decay_warmup | 78.96% |
| MobileNetV3_small_x1_0 | 30 | 1e-5 | 6400/32 | 0.025 | cosine_decay_warmup | 71.28% |
| ResNet50_vd | 60 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 82.39% |
| ResNet101_vd | 30 | 7e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 83.73% |
| Res2Net200_vd_26w_4s | 360 | 4e-5 | 1024/32 | 0.004 | cosine_decay_warmup | 85.13% |
### Data agmentation and Fix strategy
* Based on experiments mentioned above, we add AutoAugment [4] during training process, and reduced l2_decay from 4e-5 t 2e-5. Finally, the Top-1 accuracy on ImageNet1k dataset can reach 82.99%, with 0.6% improvement compared to the standard SSLD distillation strategy.
* For image classsification tasks, The model accuracy can be further improved when the test scale is 1.15 times that of training[5]. For the 82.99% ResNet50_vd pretrained model, it comes to 83.7% using 320x320 for the evaluation. We use Fix strategy to finetune the model with the training scale set as 320x320. During the process, the pre-preocessing pipeline is same for both training and test. All the weights except the fully connected layer are freezed. Finally the top-1 accuracy comes to **84.0%**.
### Some phenomena during the experiment
In the prediction process, the average value and variance of the batch norm are obtained by loading the pretrained model (set its mode as test mode). In the training process, batch norm is obtained by counting the information of the current batch (set its mode as train mode) and calculating the moving average with the historical saved information. In the distillation task, we found that through the train mode, In the distillation task, we found that the real-time change of the bn parameter of the teacher model to guide the student model is better than the student model obtained through the test mode distillation. The following is a set of experimental results. Therefore, in this distillation scheme, we use train mode to get the soft label of the teacher model.
|Teacher Model | Teacher Top1 | Student Model | Student Top1|
|- |:-: |:-: | :-: |
| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 76.00% |
| ResNet50_vd | 82.35% | MobileNetV3_large_x1_0 | 75.84% |
## Application of the distillation model
### Instructions
* Adjust the learning rate of the middle layer. The middle layer feature map of the model obtained by distillation is more refined. Therefore, when the distillation model is used as the pretrained model in other tasks, if the same learning rate as before is adopted, it is easy to destroy the features. If the learning rate of the overall model training is reduced, it will bring about the problem of slow convergence. Therefore, we use the strategy of adjusting the learning rate of the middle layer. specifically:
    * For ResNet50_vd, we set up a learning rate list. The three conv2d convolution parameters before the resiual block have a uniform learning rate multiple, and the four resiual block conv2d have theirs own learning rate parameters, respectively. 5 values need to be set in the list. By the experiment, we find that when used for transfer learning finetune classification model, the learning rate list with `[0.1,0.1,0.2,0.2,0.3]` performs better in most tasks; while in the object detection tasks, `[0.05, 0.05, 0.05, 0.1, 0.15]` can bring greater accuracy gains.
    * For MoblileNetV3_large_x1_0, because it contains 15 blocks, we set each 3 blocks to share a learning rate, so 5 learning rate values are required. We find that in classification and detection tasks, the learning rate list with `[0.25, 0.25, 0.5, 0.5, 0.75]` performs better in most tasks.
* Appropriate l2 decay. Different l2 decay values are set for different models during training. In order to prevent overfitting, l2 decay is ofen set as large for large models. L2 decay is set as `1e-4` for ResNet50, and `1e-5 ~ 4e-5` for MobileNet series models. L2 decay needs also to be adjusted when applied in other tasks. Taking Faster_RCNN_MobiletNetV3_FPN as an example, we found that only modifying l2 decay can bring up to 0.5% accuracy (mAP) improvement on the COCO2017 dataset.
### Transfer learning
* To verify the effect of the SSLD pretrained model in transfer learning, we carried out experiments on 10 small datasets. Here, in order to ensure the comparability of the experiment, we use the standard preprocessing process trained by the ImageNet1k dataset. For the distillation model, we also add a simple search method for the learning rate of the middle layers of the distillation pretrained model.
* For ResNet50_vd, the baseline pretrained model Top-1 Acc is 79.12%, the other parameters are got by grid search. For distillation pretrained model, we add learning rate of the middle layers into the search space. The following table shows the results.
| Dataset | Model | Baseline Top1 Acc | Distillation Model Finetune |
|- |:-: |:-: | :-: |
| Oxford102 flowers | ResNete50_vd | 97.18% | 97.41% |
| caltech-101 | ResNete50_vd | 92.57% | 93.21% |
| Oxford-IIIT-Pets | ResNete50_vd | 94.30% | 94.76% |
| DTD | ResNete50_vd | 76.48% | 77.71% |
| fgvc-aircraft-2013b | ResNete50_vd | 88.98% | 90.00% |
| Stanford-Cars | ResNete50_vd | 92.65% | 92.76% |
| SUN397 | ResNete50_vd | 64.02% | 68.36% |
| cifar100 | ResNete50_vd | 86.50% | 87.58% |
| cifar10 | ResNete50_vd | 97.72% | 97.94% |
| Food-101 | ResNete50_vd | 89.58% | 89.99% |
* It can be seen that on the above 10 datasets, combined with the appropriate middle layer learning rate, the distillation pretrained model can bring an average accuracy improvement of more than 1%.
### Object detection
Based on the two-stage Faster/Cascade RCNN model, we verify the effect of the pretrained model obtained by distillation.
* ResNet50_vd
Training scale and test scale are set as 640x640, and some of the ablationstudies are as follows.
| Model | train/test scale | pretrain top1 acc | feature map lr | coco mAP |
|- |:-: |:-: | :-: | :-: |
| Faster RCNN R50_vd FPN | 640/640 | 79.12% | [1.0,1.0,1.0,1.0,1.0] | 34.8% |
| Faster RCNN R50_vd FPN | 640/640 | 79.12% | [0.05,0.05,0.1,0.1,0.15] | 34.3% |
| Faster RCNN R50_vd FPN | 640/640 | 82.18% | [0.05,0.05,0.1,0.1,0.15] | 36.3% |
It can be seen here that for the baseline pretrained model, excessive adjustment of the middle-layer learning rate actually reduces the performance of the detection model. Based on this distillation model, we also provide a practical server-side detection solution. The detailed configuration and training code are open source, more details can be refer to [PaddleDetection] (https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/rcnn_enhance).
## Practice
This section will introduce the SSLD distillation experiments in detail based on the ImageNet-1K dataset. If you want to experience this method quickly, you can refer to [** Quick start PaddleClas in 30 minutes**] (../../tutorials/quick_start_en.md), whose dataset is set as Flowers102.
### Configuration
#### Distill MobileNetV3_small_x1_0 using MobileNetV3_large_x1_0
An example of SSLD distillation is provided here. The configuration file of `MobileNetV3_large_x1_0` distilling `MobileNetV3_small_x1_0` is provided in `ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml`, and the user can directly replace the path of the configuration file in `tools/train.sh` to use it.
Configuration of distilling `MobileNetV3_large_x1_0` using `MobileNetV3_small_x1_0` is as follows.
```yaml
Arch:
name: "DistillationModel"
# if not null, its lengths should be same as models
pretrained_list:
# if not null, its lengths should be same as models
freeze_params_list:
- True
- False
models:
- Teacher:
name: MobileNetV3_large_x1_0
pretrained: True
use_ssld: True
- Student:
name: MobileNetV3_small_x1_0
pretrained: False
infer_model_name: "Student"
```
In configuration file, the `freeze_params_list` needs to specify whether the model needs to freeze the parameters, the `models` needs to specify the teacher model and the student model, and the teacher model needs to load the pretrained model. The user can directly change the model here.
### Begin to train the network
If everything is ready, users can begin to train the network using the following command.
```bash
python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
--log_dir=mv3_large_x1_0_distill_mv3_small_x1_0 \
tools/train.py \
-c ./ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml
```
### Note
* Before using SSLD, users need to train a teacher model on the target dataset firstly. The teacher model is used to guide the training of the student model.
* If the student model is not loaded with a pretrained model, the other hyperparameters of the training can refer to the hyperparameters trained by the student model on ImageNet-1k. If the student model is loaded with the pre-trained model, the learning rate can be adjusted to `1/100~1/10` of the standard learning rate.
* In the process of SSLD distillation, the student model only learns the soft label, which makes the training process more difficult. It is recommended that the value of `l2_decay` can be decreased appropriately to obtain higher accuracy of the validation set.
* If users are going to add unlabeled training data, just the training list textfile needs to be adjusted for more data.
> If this document is helpful to you, welcome to star our project: [https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
## Reference
[1] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
[2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018.
[3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
[4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
[5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.

@ -0,0 +1,7 @@
distillation
================================
.. toctree::
:maxdepth: 3
distillation_en.md

@ -0,0 +1,304 @@
# How to Contribute to the PaddleClas Community
------
## Catalogue
- [1. How to Contribute Code](#1)
- [1.1 Branches of PaddleClas](#1.1)
- [1.2 Commit Code to PaddleClas](#1.2)
- [1.2.1 Codes of Fork and Clone](#1.2.1)
- [1.2.2 Connect to the Remote Repository](#1.2.2)
- [1.2.3 Create the Local Branch](#1.2.3)
- [1.2.4 Employ Pre-commit Hook](#1.2.4)
- [1.2.5 Modify and Commit Code](#1.2.5)
- [1.2.6 Keep the Local Repository Updated](#1.2.6)
- [1.2.7 Push to Remote Repository](#1.2.7)
- [1.2.8 Commit Pull Request](#1.2.8)
- [1.2.9 CLA and Unit Test](#1.2.9)
- [1.2.10 Delete Branch](#1.2.10)
- [1.2.11 Conventions](#1.2.11)
- [2. Summary](#2)
- [3. Inferences](#3)
<a name="1"></a>
## 1. How to Contribute Code
<a name="1.1"></a>
### 1.1 Branches of PaddleClas
PaddleClas maintains the following two branches:
- release/x.x series: Stable release branches, which are tagged with the release version of Paddle in due course.
The latest and the default branch is the release/2.3, which is compatible with Paddle v2.1.0.
The branch of release/x.x series will continue to grow with future iteration,
and the latest release will be maintained by default, while the former one will fix bugs with no other branches covered.
- develop : developing branch, which is adapted to the develop version of Paddle and is mainly used for
developing new functions. A good choice for secondary development.
To ensure that the develop branch can pull out the release/x.x when needed,
only the API that is valid in Paddle's latest release branch can be adopted for its code.
In other words, if a new API has been developed in this branch but not yet in the release,
please do not use it in PaddleClas. Apart from that, features that do not involve the performance optimizations,
parameter adjustments, and policy updates of the API can be developed normally.
The historical branches of PaddleClas will not be maintained, but will be remained for the existing users.
- release/static: This branch was used for static graph development and testing,
and is currently compatible with >=1.7 versions of Paddle.
It is still practicable for the special need of adapting an old version of Paddle,
but the code will not be updated except for bug fixing.
- dygraph-dev: This branch will no longer be maintained and accept no new code.
Please transfer to the develop branch as soon as possible.
PaddleClas welcomes code contributions to the repo, and the basic process is detailed in the next part.
<a name="1.2"></a>
### 1.2 Commit the Code to PaddleClas
<a name="1.2.1"></a>
#### 1.2.1 Codes of Fork and Clone
- Skip to the home page of [PaddleClas GitHub](https://github.com/PaddlePaddle/PaddleClas) and click the
Fork button to generate a repository in your own directory, such as `https://github.com/USERNAME/PaddleClas`.
[](../../images/quick_start/community/001_fork.png)
- Clone the remote repository to local
```shell
# Pull the code of the develop branch
git clone https://github.com/USERNAME/PaddleClas.git -b develop
cd PaddleClas
```
Obtain the address below
[](../../images/quick_start/community/002_clone.png)
<a name="1.2.2"></a>
#### 1.2.2 Connect to the Remote Repository
First check the current information of the remote repository with `git remote -v`.
```shell
origin https://github.com/USERNAME/PaddleClas.git (fetch)
origin https://github.com/USERNAME/PaddleClas.git (push)
```
The above information only contains the cloned remote repository,
which is the PaddleClas under your username. Then we create a remote host of the original PaddleClas repository named upstream.
```shell
git remote add upstream https://github.com/PaddlePaddle/PaddleClas.git
```
Adopt `git remote -v` to view the current information of the remote repository,
and 2 remote repositories including origin and upstream can be found, as shown below.
```shell
origin https://github.com/USERNAME/PaddleClas.git (fetch)
origin https://github.com/USERNAME/PaddleClas.git (push)
upstream https://github.com/PaddlePaddle/PaddleClas.git (fetch)
upstream https://github.com/PaddlePaddle/PaddleClas.git (push)
```
This is mainly to keep the local repository updated when committing a pull request (PR).
<a name="1.2.3"></a>
#### 1.2.3 Create the Local Branch
Run the following command to create a new local branch based on the current one.
```shell
git checkout -b new_branch
```
Or you can create new ones based on remote or upstream branches.
```shell
# Create the new_branch based on the develope of origin (unser remote repository)
git checkout -b new_branch origin/develop
# Create the new_branch base on the develope of upstream
# If you need to create a new branch from upstream,
# please first employ git fetch upstream to fetch the upstream code
git checkout -b new_branch upstream/develop
```
The following output shows that it has switched to the new branch with :
```
Branch new_branch set up to track remote branch develop from upstream.
Switched to a new branch 'new_branch'
```
<a name="1.2.4"></a>
#### 1.2.4 Employ Pre-commit Hook
Paddle developers adopt the pre-commit tool to manage Git pre-commit hooks.
It helps us format the source code (C++, Python) and automatically check basic issues before committing
e.g., one EOL per file, no large files added to Git, etc.
The pre-commit test is part of the unit tests in Travis-CI,
and PRs that do not satisfy the hook cannot be committed to PaddleClas.
Please install it first and run it in the current directory:
```
pip install pre-commit
pre-commit install
```
- **Note**
1. Paddle uses clang-format to format C/C++ source code, please make sure `clang-format` has a version of 3.8 or higher.
2. `yapf` installed by `pip install pre-commit` and `conda install -c conda-forge pre-commit` is slightly different,
and the former one is chosen by PaddleClas developers.
<a name="1.2.5"></a>
#### 1.2.5 Modify and Commit Code
You can check the changed files via `git status`. Follow the steps below to commit the `README.md` of PaddleClas after modification:
```
git add README.md
pre-commit
```
Repeat the above steps until the pre-commit format check does not report an error, as shown below.
[](../../images/quick_start/community/003_precommit_pass.png)
Run the following command to commit.
```
git commit -m "your commit info"
```
<a name="1.2.6"></a>
#### 1.2.6 Keep the Local Repository Updated
Get the latest code for upstream and update the current branch.
The upstream here is from the `Connecting to a remote repository` part in section 1.2.
```
git fetch upstream
# If you want to commit to another branch, please pull the code from another branch of upstream, in this case it is develop
git pull upstream develop
```
<a name="1.2.7"></a>
#### 1.2.7 Push to Remote Repository
```
git push origin new_branch
```
<a name="1.2.8"></a>
#### 1.2.8 Commit Pull Request
Click new pull request and select the local branch and the target branch,
as shown in the following figure. In the description of the PR, fill out what the PR accomplishes.
Next, wait for the review, and if any changes are required,
update the corresponding branch in origin by referring to the above steps.
[](../../images/quick_start/community/004_create_pr.png)
<a name="1.2.9"></a>
#### 1.2.9 CLA and Unit Test
- When you first commit a Pull Request to PaddlePaddle,
you are required to sign a CLA (Contributor License Agreement) to ensure that your code can be merged,
please follow the step below to sign CLA:
1. Please examine the Check section of your PR, find license/cla,
and click the detail on the right side to enter the CLA website
2. Click `Sign in with GitHub to agree` on the CLA website,
and you will be redirected back to your Pull Request page when you are done.
<a name="1.2.10"></a>
#### 1.2.10 Delete Branch
- Delete remote branch
When the PR is merged into the main repository, you can delete the remote branch from the PR page.
You can also delete the remote branch using `git push origin :branch name`, e.g.
```
git push origin :new_branch
```
- Delete local branch
```
# Switch to the develop branch, otherwise the current branch cannot be deleted
git checkout develop
# Delete new_branch
git branch -D new_branch
```
<a name="1.2.11"></a>
#### 1.2.11 Conventions
To help official maintainers focus on the code itself when reviewing it,
please adhere to the following conventions each time you commit code:
1. Please pass the unit test in Travis-CI first.
Otherwise, the submitted code may have problems and usually receive no official review.
2. Before committing a Pull Request:
Note the number of commits.
Reason: If only one file is modified but more than a dozen commits are committed with a few changes for each,
this may overwhelm the reviewer for they need to check each and every commit for specific changes,
including the case that the changes between commits overwrite each other.
Recommendation: Minimize the number of commits each time, and add the last commit with `git commit --amend`.
For multiple commits that have been pushed to a remote repository, please refer to
[squash commits after push](https://stackoverflow.com/questions/5667884/how-to-squash-commits-in-git-after-they-have-been-pushed).
Please pay attention to the name of each commit:
it should reflect the content of the current commit without being too casual.
3. If an issue is resolved, please add `fix #issue_number` to the first comment box of the Pull Request,
so that the corresponding issue will be closed automatically when the Pull Request is merged. Please choose the appropriate term with keywords such as close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved, please choose the appropriate term. See details in [Closing issues via commit messages](https://help.github.com/articles/closing-issues-via-commit-messages).
In addition, please stick to the following convention to respond to reviewers' comments:
1. Every review comment from the official maintainer is expected to be answered,
which will better enhance the contribution of the open source community.
- If you agree with the review and finish the corresponding modification, please simply return Done;
- If you disagree with the review, please give your reasons.
2. If there are plenty of review comments,
- Please present the revision in general.
- Please reply with `start a review` instead of a direct approach, for it may be overwhelming to receive the email of every reply.
<a name="2"></a>
## 2. Summary
- The open source community relies on the contributions and feedback of developers and users.
We highly appreciate that and look forward to your valuable comments and Pull Requests to PaddleClas in the hope that together we can build a leading practical and comprehensive code repository for image recognition!
<a name="3"></a>
## 3. References
1. [Guide to PaddlePaddle Local Development](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/08_contribution/index_en.html)
2. [Committing PR to Open Source Framework](https://blog.csdn.net/vim_wj/article/details/78300239)

@ -0,0 +1,12 @@
advanced_tutorials
================================
.. toctree::
:maxdepth: 2
DataAugmentation_en.md
distillation/index
multilabel/index
model_prune_quantization_en.md
code_overview_en.md
how_to_contribute_en.md

@ -0,0 +1,180 @@
# Model Quantization and Pruning
Complex models are conducive to better model performance, but they may also lead to certain redundancy. This section presents ways to streamline the model, including model quantization (quantization training and offline quantization) and model pruning.
Model quantization reduces the full precision to a fixed number of points to lower the redundancy and achieve the purpose of simplifying the model computation and improving model inference performance. Model quantization can reduce the size of model parameters by converting its precision from FP32 to Int8 without losing model precision, followed by accelerated computation, creating a quantized model with more speed advantages when deployed on mobile devices.
Model pruning decreases the number of model parameters by cutting out the unimportant convolutional kernels in the CNN, thus bringing down the computational complexity.
This tutorial explains how to use PaddleSlim, PaddlePaddle's model compression library, for PaddleClas compression, i.e., pruning and quantization. [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) integrates a variety of common and leading model compression functions such as model pruning, quantization (including quantization training and offline quantization), distillation, and neural network search. If you are interested, please follow us and learn more.
To start with, you are recommended to learn [PaddleClas Training](../models_training/classification_en.md) and [PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html), see [Model Pruning and Quantization Algorithms](../algorithm_introduction/model_prune_quantization_en.md) for related pruning and quantization methods.
------
## Catalogue
- [1. Prepare the Environment](#1)
- [1.1 Install PaddleSlim](#1.1)
- [1.2 Prepare the Trained Model](#1.2)
- [2. Quick Start](#2)
- [2.1 Model Quantization](#2.1)
- [2.1.1 Online Quantization Training](#2.1.1)
- [2.1.2 Offline Quantization](#2.1.2)
- [2.2 Model Pruning](#2.2)
- [3. Export the Model](#3)
- [4. Deploy the Model](#4)
- [5. Hyperparameter Training](#5)
<a name="1"></a>
## 1. Prepare the Environment
Once a model has been trained, you can adopt quantization or pruning to further compress the model size and speed up the inference.
Five steps are included
1. Install PaddleSlim
2. Prepare the trained the model
3. Compress the model
4. Export quantized inference model
5. Inference and deployment of the quantized model
<a name="1.1"></a>
### 1.1 Install PaddleSlim
- You can adopt pip install for installation.
```
pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- You can also install it from the source code with the latest features of PaddleSlim.
```
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd Paddleslim
python3.7 setup.py install
```
<a name="1.2"></a>
### 1.2 Prepare the Trained Model
PaddleClas offers a list of trained [models](../models/models_intro_en.md). If the model to be quantized is not in the list, you need to follow the [regular training](../models_training/classification_en.md) method to get the trained model.
<a name="2"></a>
## 2. Quick Start
Go to PaddleClas root directory
```shell
cd PaddleClas
```
Related code for `slim` training has been integrated under `ppcls/engine/`, and the offline quantization code can be found in `deploy/slim/quant_post_static.py`.
<a name="2.1"></a>
### 2.1 Model Quantization
Quantization training includes offline and online training. Online quantitative training, the more effective one, requires loading a pre-trained model, which can be quantized after defining the strategy.
<a name="2.1.1"></a>
#### 2.1.1 Online Quantization Training
Try the following command
- CPU/Single GPU
Take CPU for example, if you use GPU, change the `cpu` to `gpu`.
```
python3.7 tools/train.py -c ppcls/configs/slim/ResNet50_vd_quantization.yaml -o Global.device=cpu
```
The parsing of the `yaml` file is described in [reference document](../models_training/config_description_en.md). For accuracy, the `pretrained model` has already been adopted by the `yaml` file.
- Launch in single-machine multi-card/ multi-machine multi-card mode
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ppcls/configs/slim/ResNet50_vd_quantization.yaml
```
<a name="2.1.2"></a>
#### 2.1.2 Offline Quantization
**Note**: Currently, the `inference model` exported from the trained model is a must for offline quantization. See the [tutorial](../inference_deployment/export_model_en.md) for general export of the `inference model`.
Normally, offline quantization may lose more accuracy.
After generating the `inference model`, the offline quantization is run as follows:
```shell
python3.7 deploy/slim/quant_post_static.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml -o Global.save_inference_dir=./deploy/models/class_ResNet50_vd_ImageNet_infer
```
The `inference model` is stored in`Global.save_inference_dir`.
Successfully executed, the `quant_post_static_model` folder is created in the `Global.save_inference_dir`, where the generated offline quantization models are stored and can be deployed directly without re-exporting the models.
<a name="2.2"></a>
### 2.2 Model Pruning
Trying the following command
- CPU/Single GPU
Take CPU for example, if you use GPU, change the `cpu` to `gpu`.
```shell
python3.7 tools/train.py -c ppcls/configs/slim/ResNet50_vd_prune.yaml -o Global.device=cpu
```
- Launch in single-machine single-card/ single-machine multi-card/ multi machine multi-card mode
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3.7 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ppcls/configs/slim/ResNet50_vd_prune.yaml
```
<a name="3"></a>
## 3. Export the Model
Having obtained the saved model after online quantization training and pruning, it can be exported as an inference model for inference deployment. Here we take model pruning as an example:
```
python3.7 tools/export.py \
-c ppcls/configs/slim/ResNet50_vd_prune.yaml \
-o Global.pretrained_model=./output/ResNet50_vd/best_model \
-o Global.save_inference_dir=./inference
```
<a name="4"></a>
## 4. Deploy the Model
The exported model can be deployed directly using inference, please refer to [inference deployment](../inference_deployment/).
You can also use PaddleLite's opt tool to convert the inference model to a mobile model for its mobile deployment. Please refer to [Mobile Model Deployment](../inference_deployment/paddle_lite_deploy_en.md) for more details.
<a name="5"></a>
## 5. Hyperparameter Training
- For quantization and pruning training, it is recommended to load the pre-trained model obtained from conventional training to accelerate the convergence of quantization training.
- For quantization training, it is recommended to modify the initial learning rate to `1/20~1/10` of the conventional training and the number of training epochs to `1/5~1/2`, while adding Warmup to the learning rate strategy. Please make no other modifications to the configuration information.
- For pruning training, the hyperparameter configuration is recommended to remain the same as the regular training.

@ -0,0 +1,7 @@
Multilabel Classification
================================
.. toctree::
:maxdepth: 3
multilabel_en.md

@ -0,0 +1,92 @@
# Multilabel classification quick start
Based on the [NUS-WIDE-SCENE](https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html) dataset which is a subset of NUS-WIDE dataset, you can experience multilabel of PaddleClas, include training, evaluation and prediction. Please refer to [Installation](../../installation/) to install at first.
## Preparation
* Enter PaddleClas directory
```
cd path_to_PaddleClas
```
* Create and enter `dataset/NUS-WIDE-SCENE` directory, download and decompress NUS-WIDE-SCENE dataset
```shell
mkdir dataset/NUS-WIDE-SCENE
cd dataset/NUS-WIDE-SCENE
wget https://paddle-imagenet-models-name.bj.bcebos.com/data/NUS-SCENE-dataset.tar
tar -xf NUS-SCENE-dataset.tar
```
* Return `PaddleClas` root home
```
cd ../../
```
## Training
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
```
After training for 10 epochs, the best accuracy over the validation set should be around 0.95.
## Evaluation
```bash
python tools/eval.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
## Prediction
```bash
python3 tools/infer.py
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
You will get multiple output such as the following:
```
[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]
```
## Prediction based on prediction engine
### Export model
```bash
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
The default path of the inference model is under the current path `./inference`
### Prediction based on prediction engine
Enter the deploy directory:
```bash
cd ./deploy
```
Prediction based on prediction engine:
```
python3 python/predict_cls.py \
-c configs/inference_multilabel_cls.yaml
```
You will get multiple output such as the following:
```
0517_2715693311.jpg: class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
```

@ -0,0 +1,264 @@
# Data Augmentation
------
## Catalogue
- [1. Introduction to data augmentation](#1)
- [2. Common data augmentation methods](#2)
- [2.1 Image Transformation](#2.1)
- [2.1.1 AutoAugment](#2.1.1)
- [2.1.2 RandAugment](#2.1.2)
- [2.1.3 TimmAutoAugment](#2.1.3)
- [2.2 Image Cropping](#2.2)
- [2.2.1 Cutout](#2.2.1)
- [2.2.2 RandomErasing](#2.2.2)
- [2.2.3 HideAndSeek](#2.2.3)
- [2.2.4 GridMask](#2.2.4)
- [2.3 Image mix](#2.3)
- [2.3.1 Mixup](#2.3.1)
- [2.3.2 Cutmix](#2.3.2)
<a name="1"></a>
## 1. Introduction to data augmentation
Data augmentation is a commonly used regularization method in image classification task, which is often used in scenarios with insufficient data or large model. In this chapter, we mainly introduce 8 image augmentation methods besides standard augmentation methods. Users can apply these methods in their own tasks for better model performance. Under the same conditions, these augmentation methods' performance on ImageNet1k dataset is shown as follows.
![](../../images/image_aug/main_image_aug.png)
<a name="2"></a>
## 2. Common data augmentation methods
If without special explanation, all the examples and experiments in this chapter are based on ImageNet1k dataset with the network input image size set as 224.
The standard data augmentation pipeline in ImageNet classification tasks contains the following steps.
1. Decode image, abbreviated as `ImageDecode`.
2. Randomly crop the image to size with 224x224, abbreviated as `RandCrop`.
3. Randomly flip the image horizontally, abbreviated as `RandFlip`.
4. Normalize the image pixel values, abbreviated as `Normalize`.
5. Transpose the image from `[224, 224, 3]`(HWC) to `[3, 224, 224]`(CHW), abbreviated as `Transpose`.
6. Group the image data(`[3, 224, 224]`) into a batch(`[N, 3, 224, 224]`), where `N` is the batch size. It is abbreviated as `Batch`.
Compared with the above standard image augmentation methods, the researchers have also proposed many improved image augmentation strategies. These strategies are to insert certain operations at different stages of the standard augmentation method, based on the different stages of operation. We divide it into the following three categories.
1. Transformation. Perform some transformations on the image after `RandCrop`, such as AutoAugment and RandAugment.
2. Cropping. Perform some transformations on the image after `Transpose`, such as CutOut, RandErasing, HideAndSeek and GridMask.
3. Aliasing. Perform some transformations on the image after `Batch`, such as Mixup and Cutmix.
Visualization results of some images after augmentation are shown as follows.
![](../../images/image_aug/image_aug_samples_s_en.jpg)
The following table shows more detailed information of the transformations.
| Method | Input | Output | Auto-<br>Augment\[1\] | Rand-<br>Augment\[2\] | CutOut\[3\] | Rand<br>Erasing\[4\] | HideAnd-<br>Seek\[5\] | GridMask\[6\] | Mixup\[7\] | Cutmix\[8\] |
|-------------|---------------------------|---------------------------|------------------|------------------|-------------|------------------|------------------|---------------|------------|------------|
| Image<br>Decode | Binary | (224, 224, 3)<br>uint8 | Y | Y | Y | Y | Y | Y | Y | Y |
| RandCrop | (:, :, 3)<br>uint8 | (224, 224, 3)<br>uint8 | Y | Y | Y | Y | Y | Y | Y | Y |
| **Process** | (224, 224, 3)<br>uint8 | (224, 224, 3)<br>uint8 | Y | Y | \- | \- | \- | \- | \- | \- |
| RandFlip | (224, 224, 3)<br>uint8 | (224, 224, 3)<br>float32 | Y | Y | Y | Y | Y | Y | Y | Y |
| Normalize | (224, 224, 3)<br>uint8 | (3, 224, 224)<br>float32 | Y | Y | Y | Y | Y | Y | Y | Y |
| Transpose | (224, 224, 3)<br>float32 | (3, 224, 224)<br>float32 | Y | Y | Y | Y | Y | Y | Y | Y |
| **Process** | (3, 224, 224)<br>float32 | (3, 224, 224)<br>float32 | \- | \- | Y | Y | Y | Y | \- | \- |
| Batch | (3, 224, 224)<br>float32 | (N, 3, 224, 224)<br>float32 | Y | Y | Y | Y | Y | Y | Y | Y |
| **Process** | (N, 3, 224, 224)<br>float32 | (N, 3, 224, 224)<br>float32 | \- | \- | \- | \- | \- | \- | Y | Y |
PaddleClas integrates all the above data augmentation strategies. More details including principles and usage of the strategies are introduced in the following chapters. For better visualization, we use the following figure to show the changes after the transformations. And `RandCrop` is replaced with` Resize` for simplification.
![](../../images/image_aug/test_baseline.jpeg)
<a name="2.1"></a>
### 2.1 Image Transformation
Transformation means performing some transformations on the image after `RandCrop`. It mainly contains AutoAugment and RandAugment.
<a name="2.1.1"></a>
#### 2.1.1 AutoAugment
Address[https://arxiv.org/abs/1805.09501v1](https://arxiv.org/abs/1805.09501v1)
Github repo[https://github.com/DeepVoltaire/AutoAugment](https://github.com/DeepVoltaire/AutoAugment)
Unlike conventional artificially designed image augmentation methods, AutoAugment is an image augmentation solution suitable for a specific data set found by certain search algorithm in the search space of a series of image augmentation sub-strategies. For the ImageNet dataset, the final data augmentation solution contains 25 sub-strategy combinations. Each sub-strategy contains two transformations. For each image, a sub-strategy combination is randomly selected and then determined with a certain probability Perform each transformation in the sub-strategy.
The images after `AutoAugment` are as follows.
![](../../images/image_aug/test_autoaugment.jpeg)
<a name="2.1.2"></a>
#### 2.1.2 RandAugment
Address: [https://arxiv.org/pdf/1909.13719.pdf](https://arxiv.org/pdf/1909.13719.pdf)
Github repo: [https://github.com/heartInsert/randaugment](https://github.com/heartInsert/randaugment)
The search method of `AutoAugment` is relatively violent. Searching for the optimal strategy for this data set directly on the data set requires a lot of computation. In `RandAugment`, the author found that on the one hand, for larger models and larger datasets, the gains generated by the augmentation method searched using `AutoAugment` are smaller. On the other hand, the searched strategy is limited to certain dataset, which has poor generalization performance and not sutable for other datasets.
In `RandAugment`, the author proposes a random augmentation method. Instead of using a specific probability to determine whether to use a certain sub-strategy, all sub-strategies are selected with the same probability. The experiments in the paper also show that this method performs well even for large models.
The images after `RandAugment` are as follows.
![](../../images/image_aug/test_randaugment.jpeg)
<a name="2.1.3"></a>
#### 2.1.3 TimmAutoAugment
Github open source code address: [https://github.com/rwightman/pytorch-image-models/blob/master/timm/data/auto_augment.py](https://github.com/rwightman/pytorch-image-models/blob/master/timm/data/auto_augment.py)
`TimmAutoAugment` is an improvement of AutoAugment and RandAugment by open source authors. Facts have proved that it has better performance on many visual tasks. At present, most VisionTransformer models are implemented based on TimmAutoAugment.
<a name="2.2"></a>
### 2.2 Image Cropping
Cropping means performing some transformations on the image after `Transpose`, setting pixels of the cropped area as certain constant. It mainly contains CutOut, RandErasing, HideAndSeek and GridMask.
Image cropping methods can be operated before or after normalization. The difference is that if we crop the image before normalization and fill the areas with 0, the cropped areas' pixel values will not be 0 after normalization, which will cause grayscale distribution change of the data.
The above-mentioned cropping transformation ideas are the similar, all to solve the problem of poor generalization ability of the trained model on occlusion images, the difference lies in that their cropping details.
<a name="2.2.1"></a>
#### 2.2.1 Cutout
Address: [https://arxiv.org/abs/1708.04552](https://arxiv.org/abs/1708.04552)
Github repo: [https://github.com/uoguelph-mlrg/Cutout](https://github.com/uoguelph-mlrg/Cutout)
Cutout is a kind of dropout, but occludes input image rather than feature map. It is more robust to noise than noise. Cutout has two advantages: (1) Using Cutout, we can simulate the situation when the subject is partially occluded. (2) It can promote the model to make full use of more content in the image for classification, and prevent the network from focusing only on the saliency area, thereby causing overfitting.
The images after `Cutout` are as follows.
![](../../images/image_aug/test_cutout.jpeg)
<a name="2.2.2"></a>
#### 2.2.2 RandomErasing
Address: [https://arxiv.org/pdf/1708.04896.pdf](https://arxiv.org/pdf/1708.04896.pdf)
Github repo: [https://github.com/zhunzhong07/Random-Erasing](https://github.com/zhunzhong07/Random-Erasing)
RandomErasing is similar to the Cutout. It is also to solve the problem of poor generalization ability of the trained model on images with occlusion. The author also pointed out in the paper that the way of random cropping is complementary to random horizontal flipping. The author also verified the effectiveness of the method on pedestrian re-identification (REID). Unlike `Cutout`, in` `, `RandomErasing` is operateed on the image with a certain probability, size and aspect ratio of the generated mask are also randomly generated according to pre-defined hyperparameters.
The images after `RandomErasing` are as follows.
![](../../images/image_aug/test_randomerassing.jpeg)
<a name="2.2.3"></a>
#### 2.2.3 HideAndSeek
Address: [https://arxiv.org/pdf/1811.02545.pdf](https://arxiv.org/pdf/1811.02545.pdf)
Github repo: [https://github.com/kkanshul/Hide-and-Seek](https://github.com/kkanshul/Hide-and-Seek)
Images are divided into some patches for `HideAndSeek` and masks are generated with certain probability for each patch. The meaning of the masks in different areas is shown in the figure below.
![](../../images/image_aug/hide-and-seek-visual.png)
The images after `HideAndSeek` are as follows.
![](../../images/image_aug/gridmask-0.png)
<a name="2.2.4"></a>
#### 2.2.4 GridMask
Address[https://arxiv.org/abs/2001.04086](https://arxiv.org/abs/2001.04086)
Github repo[https://github.com/akuxcw/GridMask](https://github.com/akuxcw/GridMask)
The author points out that the previous method based on image cropping has two problems, as shown in the following figure:
1. Excessive deletion of the area may cause most or all of the target subject to be deleted, or cause the context information loss, resulting in the images after enhancement becoming noisy data.
2. Reserving too much area has little effect on the object and context.
![](../../images/image_aug/hide-and-seek-visual.png)
Therefore, it is the core problem to be solved how to
if you avoid over-deletion or over-retention becomes the core problem to be solved.
`GridMask` is to generate a mask with the same resolution as the original image and multiply it with the original image. The mask grid and size are adjusted by the hyperparameters.
In the training process, there are two methods to use:
1. Set a probability p and use the GridMask to augment the image with probability p from the beginning of training.
2. Initially set the augmentation probability to 0, and the probability is increased with number of iterations from 0 to p.
It shows that the second method is better.
The images after `GridMask` are as follows.
![](../../images/image_aug/test_gridmask.jpeg)
<a name="2.3"></a>
### 2.3 Image mix
mix means performing some transformations on the image after `Batch`, which contains Mixup and Cutmix.
Data augmentation methods introduced before are based on single image while mixing is carried on a certain batch to generate a new batch.
<a name="2.3.1"></a>
#### 2.3.1 Mixup
Address: [https://arxiv.org/pdf/1710.09412.pdf](https://arxiv.org/pdf/1710.09412.pdf)
Github repo: [https://github.com/facebookresearch/mixup-cifar10](https://github.com/facebookresearch/mixup-cifar10)
Mixup is the first solution for image aliasing, it is easy to realize and performs well not only on image classification but also on object detection. Mixup is usually carried out in a batch for simplification, so as `Cutmix`.
The images after `Mixup` are as follows.
![](../../images/image_aug/test_mixup.png)
<a name="2.3.2"></a>
#### 2.3.2 Cutmix
Address: [https://arxiv.org/pdf/1905.04899v2.pdf](https://arxiv.org/pdf/1905.04899v2.pdf)
Github repo: [https://github.com/clovaai/CutMix-PyTorch](https://github.com/clovaai/CutMix-PyTorch)
Unlike `Mixup` which directly adds two images, for Cutmix, an `ROI` is cut out from one image and
Cutmix randomly cuts out an `ROI` from one image, and then covered onto the corresponding area in the another image.
The images after `Cutmix` are as follows.
![](../../images/image_aug/test_cutmix.png)
For the practical part of data augmentation, please refer to [Data Augmentation Practice](../advanced_tutorials/DataAugmentation_en.md).
## Reference
[1] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
[2] Cubuk E D, Zoph B, Shlens J, et al. Randaugment: Practical automated data augmentation with a reduced search space[J]. arXiv preprint arXiv:1909.13719, 2019.
[3] DeVries T, Taylor G W. Improved regularization of convolutional neural networks with cutout[J]. arXiv preprint arXiv:1708.04552, 2017.
[4] Zhong Z, Zheng L, Kang G, et al. Random erasing data augmentation[J]. arXiv preprint arXiv:1708.04896, 2017.
[5] Singh K K, Lee Y J. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization[C]//2017 IEEE international conference on computer vision (ICCV). IEEE, 2017: 3544-3553.
[6] Chen P. GridMask Data Augmentation[J]. arXiv preprint arXiv:2001.04086, 2020.
[7] Zhang H, Cisse M, Dauphin Y N, et al. mixup: Beyond empirical risk minimization[J]. arXiv preprint arXiv:1710.09412, 2017.
[8] Yun S, Han D, Oh S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 6023-6032.
[test_baseline]: ../../../images/image_aug/test_baseline.jpeg
[test_autoaugment]: ../../../images/image_aug/test_autoaugment.jpeg
[test_cutout]: ../../../images/image_aug/test_cutout.jpeg
[test_gridmask]: ../../../images/image_aug/test_gridmask.jpeg
[gridmask-0]: ../../../images/image_aug/gridmask-0.png
[test_hideandseek]: ../../../images/image_aug/test_hideandseek.jpeg
[test_randaugment]: ../../../images/image_aug/test_randaugment.jpeg
[test_randomerassing]: ../../../images/image_aug/test_randomerassing.jpeg
[hide_and_seek_mask_expanation]: ../../../images/image_aug/hide-and-seek-visual.png
[test_mixup]: ../../../images/image_aug/test_mixup.png
[test_cutmix]: ../../../images/image_aug/test_cutmix.png

@ -0,0 +1,599 @@
# ImageNet Model zoo overview
## Catalogue
- [1. Model library overview diagram](#1)
- [2. SSLD pretrained models](#2)
- [2.1 Server-side knowledge distillation model](#2.1)
- [2.2 Mobile-side knowledge distillation model](#2.2)
- [2.3 Intel-CPU-side knowledge distillation model](#2.3)
- [3. PP-LCNet series](#3)
- [4. ResNet series](#4)
- [5. Mobile series](#5)
- [6. SEResNeXt and Res2Net series](#6)
- [7. DPN and DenseNet series](#7)
- [8. HRNet series](#8)
- [9. Inception series](#9)
- [10. EfficientNet ans ResNeXt101_wsl series](#10)
- [11. ResNeSt and RegNet series](#11)
- [12. ViT and DeiT series](#12)
- [13. RepVGG series](#13)
- [14. MixNet series](#14)
- [15. ReXNet series](#15)
- [16. SwinTransformer series](#16)
- [17. LeViT series](#17)
- [18. Twins series](#18)
- [19. HarDNet series](#19)
- [20. DLA series](#20)
- [21. RedNet series](#21)
- [22. TNT series](#22)
- [23. Other models](#23)
- [Reference](#reference)
<a name="1"></a>
## 1. Model library overview diagram
Based on the ImageNet-1k classification dataset, the 37 classification network structures supported by PaddleClas and the corresponding 217 image classification pretrained models are shown below. Training trick, a brief introduction to each series of network structures, and performance evaluation will be shown in the corresponding chapters. The evaluation environment is as follows.
* Arm CPU evaluation environment is based on Snapdragon 855 (SD855).
* Intel CPU evaluation environment is based on Intel(R) Xeon(R) Gold 6148.
* The GPU evaluation speed is measured by running 2100 times under the FP32+TensorRT configuration (excluding the warmup time of the first 100 times).
* FLOPs and Params are calculated by `paddle.flops()` (PaddlePaddle version is 2.2)
Curves of accuracy to the inference time of common server-side models are shown as follows.
![](../../images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.png)
Curves of accuracy to the inference time of common mobile-side models are shown as follows.
![](../../images/models/mobile_arm_top1.png)
Curves of accuracy to the inference time of some VisionTransformer models are shown as follows.
![](../../images/models/V100_benchmark/v100.fp32.bs1.visiontransformer.png)
<a name="2"></a>
## 2. SSLD pretrained models
Accuracy and inference time of the prtrained models based on SSLD distillation are as follows. More detailed information can be refered to [SSLD distillation tutorial](../advanced_tutorials/distillation/distillation_en.md).
<a name="2.1"></a>
### 2.1 Server-side knowledge distillation model
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|-----------------------------------|-----------------------------------|
| ResNet34_vd_ssld | 0.797 | 0.760 | 0.037 | 2.00 | 3.28 | 5.84 | 3.93 | 21.84 | <span style="white-space:nowrap;">[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams)&emsp;&emsp;</span> | <span style="white-space:nowrap;">[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet34_vd_ssld_infer.tar)&emsp;&emsp;</span> |
| ResNet50_vd_ssld | 0.830 | 0.792 | 0.039 | 2.60 | 4.86 | 7.63 | 4.35 | 25.63 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_ssld_infer.tar) |
| ResNet101_vd_ssld | 0.837 | 0.802 | 0.035 | 4.43 | 8.25 | 12.60 | 8.08 | 44.67 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet101_vd_ssld_infer.tar) |
| Res2Net50_vd_26w_4s_ssld | 0.831 | 0.798 | 0.033 | 3.59 | 6.35 | 9.50 | 4.28 | 25.76 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Res2Net50_vd_26w_4s_ssld_infer.tar) |
| Res2Net101_vd_<br>26w_4s_ssld | 0.839 | 0.806 | 0.033 | 6.34 | 11.02 | 16.13 | 8.35 | 45.35 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Res2Net101_vd_26w_4s_ssld_infer.tar) |
| Res2Net200_vd_<br>26w_4s_ssld | 0.851 | 0.812 | 0.049 | 11.45 | 19.77 | 28.81 | 15.77 | 76.44 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Res2Net200_vd_26w_4s_ssld_infer.tar) |
| HRNet_W18_C_ssld | 0.812 | 0.769 | 0.043 | 6.66 | 8.94 | 11.95 | 4.32 | 21.35 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W18_C_ssld_infer.tar) |
| HRNet_W48_C_ssld | 0.836 | 0.790 | 0.046 | 11.07 | 17.06 | 27.28 | 17.34 | 77.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W48_C_ssld_infer.tar) |
| SE_HRNet_W64_C_ssld | 0.848 | - | - | 17.11 | 26.87 | 43.24 | 29.00 | 129.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SE_HRNet_W64_C_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SE_HRNet_W64_C_ssld_infer.tar) |
<a name="2.2"></a>
### 2.2 Mobile-side knowledge distillation model
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | SD855 time(ms)<br>bs=1, thread=1 | SD855 time(ms)<br/>bs=1, thread=2 | SD855 time(ms)<br/>bs=1, thread=4 | FLOPs(M) | Params(M) | <span style="white-space:nowrap;">Model大小(M)</span> | Pretrained Model Download Address | Inference Model Download Address |
|---------------------|-----------|-----------|---------------|----------------|-----------|----------|-----------|-----------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|
| MobileNetV1_ssld | 0.779 | 0.710 | 0.069 | 30.24 | 17.86 | 10.30 | 578.88 | 4.25 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_ssld_infer.tar) |
| MobileNetV2_ssld | 0.767 | 0.722 | 0.045 | 20.74 | 12.71 | 8.10 | 327.84 | 3.54 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV2_ssld_infer.tar) |
| MobileNetV3_small_x0_35_ssld | 0.556 | 0.530 | 0.026 | 2.23 | 1.66 | 1.43 | 14.56 | 1.67 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_small_x0_35_ssld_infer.tar) |
| MobileNetV3_large_x1_0_ssld | 0.790 | 0.753 | 0.036 | 16.55 | 10.09 | 6.84 | 229.66 | 5.50 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_ssld_infer.tar) |
| MobileNetV3_small_x1_0_ssld | 0.713 | 0.682 | 0.031 | 5.63 | 3.65 | 2.60 | 63.67 | 2.95 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_small_x1_0_ssld_infer.tar) |
| GhostNet_x1_3_ssld | 0.794 | 0.757 | 0.037 | 19.16 | 12.25 | 9.40 | 236.89 | 7.38 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/GhostNet_x1_3_ssld_infer.tar) |
<a name="2.3"></a>
### 2.3 Intel-CPU-side knowledge distillation model
| Model | Top-1 Acc | Reference<br>Top-1 Acc | Acc gain | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|---------------------|-----------|-----------|---------------|----------------|----------|-----------|-----------------------------------|-----------------------------------|
| PPLCNet_x0_5_ssld | 0.661 | 0.631 | 0.030 | 2.05 | 47.28 | 1.89 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_5_ssld_infer.tar) |
| PPLCNet_x1_0_ssld | 0.744 | 0.713 | 0.033 | 2.46 | 160.81 | 2.96 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_ssld_infer.tar) |
| PPLCNet_x2_5_ssld | 0.808 | 0.766 | 0.042 | 5.39 | 906.49 | 9.04 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_5_ssld_infer.tar) |
* Note: `Reference Top-1 Acc` means the accuracy of the pre-trained model obtained by PaddleClas based on ImageNet1k dataset training.
<a name="3"></a>
## 3. PP-LCNet series <sup>[[28](#ref28)]</sup>
The accuracy and speed indicators of the PP-LCNet series models are shown in the following table. For more information about this series of models, please refer to: [PP-LCNet series model documents](../models/PP-LCNet_en.md)。
| Model | Top-1 Acc | Top-5 Acc | Intel-Xeon-Gold-6148 time(ms)<br>bs=1 | FLOPs(M) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|:--:|:--:|:--:|:--:|----|----|----|:--:|
| PPLCNet_x0_25 |0.5186 | 0.7565 | 1.61785 | 18.25 | 1.52 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_25_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_25_infer.tar) |
| PPLCNet_x0_35 |0.5809 | 0.8083 | 2.11344 | 29.46 | 1.65 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_35_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_35_infer.tar) |
| PPLCNet_x0_5 |0.6314 | 0.8466 | 2.72974 | 47.28 | 1.89 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_5_infer.tar) |
| PPLCNet_x0_75 |0.6818 | 0.8830 | 4.51216 | 98.82 | 2.37 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x0_75_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x0_75_infer.tar) |
| PPLCNet_x1_0 |0.7132 | 0.9003 | 6.49276 | 160.81 | 2.96 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_0_infer.tar) |
| PPLCNet_x1_5 |0.7371 | 0.9153 | 12.2601 | 341.86 | 4.52 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x1_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x1_5_infer.tar) |
| PPLCNet_x2_0 |0.7518 | 0.9227 | 20.1667 | 590 | 6.54 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_0_infer.tar) |
| PPLCNet_x2_5 |0.7660 | 0.9300 | 29.595 | 906 | 9.04 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/PPLCNet_x2_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/PPLCNet_x2_5_infer.tar) |
<a name="4"></a>
## 4. ResNet series <sup>[[1](#ref1)]</sup>
The accuracy and speed indicators of ResNet and ResNet_vd series models are shown in the following table. For more information about this series of models, please refer to: [ResNet and ResNet_vd series model documents](../models/ResNet_and_vd_en.md)。
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|---------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
| ResNet18 | 0.7098 | 0.8992 | 1.22 | 2.19 | 3.63 | 1.83 | 11.70 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet18_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet18_infer.tar) |
| ResNet18_vd | 0.7226 | 0.9080 | 1.26 | 2.28 | 3.89 | 2.07 | 11.72 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet18_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet18_vd_infer.tar) |
| ResNet34 | 0.7457 | 0.9214 | 1.97 | 3.25 | 5.70 | 3.68 | 21.81 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet34_infer.tar) |
| ResNet34_vd | 0.7598 | 0.9298 | 2.00 | 3.28 | 5.84 | 3.93 | 21.84 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet34_vd_infer.tar) |
| ResNet34_vd_ssld | 0.7972 | 0.9490 | 2.00 | 3.28 | 5.84 | 3.93 | 21.84 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet34_vd_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet34_vd_ssld_infer.tar) |
| ResNet50 | 0.7650 | 0.9300 | 2.54 | 4.79 | 7.40 | 4.11 | 25.61 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_infer.tar) |
| ResNet50_vc | 0.7835 | 0.9403 | 2.57 | 4.83 | 7.52 | 4.35 | 25.63 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vc_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vc_infer.tar) |
| ResNet50_vd | 0.7912 | 0.9444 | 2.60 | 4.86 | 7.63 | 4.35 | 25.63 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar) |
| ResNet101 | 0.7756 | 0.9364 | 4.37 | 8.18 | 12.38 | 7.83 | 44.65 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet101_infer.tar) |
| ResNet101_vd | 0.8017 | 0.9497 | 4.43 | 8.25 | 12.60 | 8.08 | 44.67 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet101_vd_infer.tar) |
| ResNet152 | 0.7826 | 0.9396 | 6.05 | 11.41 | 17.33 | 11.56 | 60.34 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet152_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet152_infer.tar) |
| ResNet152_vd | 0.8059 | 0.9530 | 6.11 | 11.51 | 17.59 | 11.80 | 60.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet152_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet152_vd_infer.tar) |
| ResNet200_vd | 0.8093 | 0.9533 | 7.70 | 14.57 | 22.16 | 15.30 | 74.93 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet200_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet200_vd_infer.tar) |
| ResNet50_vd_<br>ssld | 0.8300 | 0.9640 | 2.60 | 4.86 | 7.63 | 4.35 | 25.63 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_ssld_infer.tar) |
| ResNet101_vd_<br>ssld | 0.8373 | 0.9669 | 4.43 | 8.25 | 12.60 | 8.08 | 44.67 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet101_vd_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet101_vd_ssld_infer.tar) |
<a name="5"></a>
## 5. Mobile series <sup>[[3](#ref3)][[4](#ref4)][[5](#ref5)][[6](#ref6)][[23](#ref23)]</sup>
The accuracy and speed indicators of the mobile series models are shown in the following table. For more information about this series, please refer to: [Mobile series model documents](../models/Mobile_en.md)。
| Model | Top-1 Acc | Top-5 Acc | SD855 time(ms)<br>bs=1, thread=1 | SD855 time(ms)<br/>bs=1, thread=2 | SD855 time(ms)<br/>bs=1, thread=4 | FLOPs(M) | Params(M) | <span style="white-space:nowrap;">Model大小(M)</span> | Pretrained Model Download Address | Inference Model Download Address |
|----------------------------------|-----------|-----------|------------------------|----------|-----------|---------|-----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
| MobileNetV1_<br>x0_25 | 0.5143 | 0.7546 | 2.88 | 1.82 | 1.26 | 43.56 | 0.48 | 1.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_x0_25_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_x0_25_infer.tar) |
| MobileNetV1_<br>x0_5 | 0.6352 | 0.8473 | 8.74 | 5.26 | 3.09 | 154.57 | 1.34 | 5.2 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_x0_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_x0_5_infer.tar) |
| MobileNetV1_<br>x0_75 | 0.6881 | 0.8823 | 17.84 | 10.61 | 6.21 | 333.00 | 2.60 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_x0_75_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_x0_75_infer.tar) |
| MobileNetV1 | 0.7099 | 0.8968 | 30.24 | 17.86 | 10.30 | 578.88 | 4.25 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_infer.tar) |
| MobileNetV1_<br>ssld | 0.7789 | 0.9394 | 30.24 | 17.86 | 10.30 | 578.88 | 4.25 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV1_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV1_ssld_infer.tar) |
| MobileNetV2_<br>x0_25 | 0.5321 | 0.7652 | 3.46 | 2.51 | 2.03 | 34.18 | 1.53 | 6.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_25_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV2_x0_25_infer.tar) |
| MobileNetV2_<br>x0_5 | 0.6503 | 0.8572 | 7.69 | 4.92 | 3.57 | 99.48 | 1.98 | 7.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV2_x0_5_infer.tar) |
| MobileNetV2_<br>x0_75 | 0.6983 | 0.8901 | 13.69 | 8.60 | 5.82 | 197.37 | 2.65 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x0_75_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV2_x0_75_infer.tar) |
| MobileNetV2 | 0.7215 | 0.9065 | 20.74 | 12.71 | 8.10 | 327.84 | 3.54 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV2_infer.tar) |
| MobileNetV2_<br>x1_5 | 0.7412 | 0.9167 | 40.79 | 24.49 | 15.50 | 702.35 | 6.90 | 26 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x1_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV2_x1_5_infer.tar) |
| MobileNetV2_<br>x2_0 | 0.7523 | 0.9258 | 67.50 | 40.03 | 25.55 | 1217.25 | 11.33 | 43 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_x2_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV2_x2_0_infer.tar) |
| MobileNetV2_<br>ssld | 0.7674 | 0.9339 | 20.74 | 12.71 | 8.10 | 327.84 | 3.54 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV2_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV2_ssld_infer.tar) |
| MobileNetV3_<br>large_x1_25 | 0.7641 | 0.9295 | 24.52 | 14.76 | 9.89 | 362.70 | 7.47 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_25_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_25_infer.tar) |
| MobileNetV3_<br>large_x1_0 | 0.7532 | 0.9231 | 16.55 | 10.09 | 6.84 | 229.66 | 5.50 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar) |
| MobileNetV3_<br>large_x0_75 | 0.7314 | 0.9108 | 11.53 | 7.06 | 4.94 | 151.70 | 3.93 | 16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x0_75_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x0_75_infer.tar) |
| MobileNetV3_<br>large_x0_5 | 0.6924 | 0.8852 | 6.50 | 4.22 | 3.15 | 71.83 | 2.69 | 11 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x0_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x0_5_infer.tar) |
| MobileNetV3_<br>large_x0_35 | 0.6432 | 0.8546 | 4.43 | 3.11 | 2.41 | 40.90 | 2.11 | 8.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x0_35_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x0_35_infer.tar) |
| MobileNetV3_<br>small_x1_25 | 0.7067 | 0.8951 | 7.88 | 4.91 | 3.45 | 100.07 | 3.64 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_25_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_small_x1_25_infer.tar) |
| MobileNetV3_<br>small_x1_0 | 0.6824 | 0.8806 | 5.63 | 3.65 | 2.60 | 63.67 | 2.95 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_small_x1_0_infer.tar) |
| MobileNetV3_<br>small_x0_75 | 0.6602 | 0.8633 | 4.50 | 2.96 | 2.19 | 46.02 | 2.38 | 9.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_75_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_small_x0_75_infer.tar) |
| MobileNetV3_<br>small_x0_5 | 0.5921 | 0.8152 | 2.89 | 2.04 | 1.62 | 22.60 | 1.91 | 7.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_small_x0_5_infer.tar) |
| MobileNetV3_<br>small_x0_35 | 0.5303 | 0.7637 | 2.23 | 1.66 | 1.43 | 14.56 | 1.67 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_small_x0_35_infer.tar) |
| MobileNetV3_<br>small_x0_35_ssld | 0.5555 | 0.7771 | 2.23 | 1.66 | 1.43 | 14.56 | 1.67 | 6.9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x0_35_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_small_x0_35_ssld_infer.tar) |
| MobileNetV3_<br>large_x1_0_ssld | 0.7896 | 0.9448 | 16.55 | 10.09 | 6.84 | 229.66 | 5.50 | 21 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_large_x1_0_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_ssld_infer.tar) |
| MobileNetV3_small_<br>x1_0_ssld | 0.7129 | 0.9010 | 5.63 | 3.65 | 2.60 | 63.67 | 2.95 | 12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/MobileNetV3_small_x1_0_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_small_x1_0_ssld_infer.tar) |
| ShuffleNetV2 | 0.6880 | 0.8845 | 9.72 | 5.97 | 4.13 | 148.86 | 2.29 | 9 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ShuffleNetV2_x1_0_infer.tar) |
| ShuffleNetV2_<br>x0_25 | 0.4990 | 0.7379 | 1.94 | 1.53 | 1.43 | 18.95 | 0.61 | 2.7 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_25_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ShuffleNetV2_x0_25_infer.tar) |
| ShuffleNetV2_<br>x0_33 | 0.5373 | 0.7705 | 2.23 | 1.70 | 1.79 | 24.04 | 0.65 | 2.8 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_33_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ShuffleNetV2_x0_33_infer.tar) |
| ShuffleNetV2_<br>x0_5 | 0.6032 | 0.8226 | 3.67 | 2.63 | 2.06 | 42.58 | 1.37 | 5.6 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x0_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ShuffleNetV2_x0_5_infer.tar) |
| ShuffleNetV2_<br>x1_5 | 0.7163 | 0.9015 | 17.21 | 10.56 | 6.81 | 301.35 | 3.53 | 14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x1_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ShuffleNetV2_x1_5_infer.tar) |
| ShuffleNetV2_<br>x2_0 | 0.7315 | 0.9120 | 31.21 | 18.98 | 11.65 | 571.70 | 7.40 | 28 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_x2_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ShuffleNetV2_x2_0_infer.tar) |
| ShuffleNetV2_<br>swish | 0.7003 | 0.8917 | 31.21 | 9.06 | 5.74 | 148.86 | 2.29 | 9.1 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ShuffleNetV2_swish_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ShuffleNetV2_swish_infer.tar) |
| GhostNet_<br>x0_5 | 0.6688 | 0.8695 | 5.28 | 3.95 | 3.29 | 46.15 | 2.60 | 10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x0_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/GhostNet_x0_5_infer.tar) |
| GhostNet_<br>x1_0 | 0.7402 | 0.9165 | 12.89 | 8.66 | 6.72 | 148.78 | 5.21 | 20 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/GhostNet_x1_0_infer.tar) |
| GhostNet_<br>x1_3 | 0.7579 | 0.9254 | 19.16 | 12.25 | 9.40 | 236.89 | 7.38 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/GhostNet_x1_3_infer.tar) |
| GhostNet_<br>x1_3_ssld | 0.7938 | 0.9449 | 19.16 | 12.25 | 9.40 | 236.89 | 7.38 | 29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GhostNet_x1_3_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/GhostNet_x1_3_ssld_infer.tar) |
| ESNet_x0_25 | 0.6248 | 0.8346 |4.12|2.97|2.51| 30.85 | 2.83 | 11 |[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ESNet_x0_25_pretrained.pdparams) |[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ESNet_x0_25_infer.tar) |
| ESNet_x0_5 | 0.6882 | 0.8804 |6.45|4.42|3.35| 67.31 | 3.25 | 13 |[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ESNet_x0_5_pretrained.pdparams) |[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ESNet_x0_5_infer.tar) |
| ESNet_x0_75 | 0.7224 | 0.9045 |9.59|6.28|4.52| 123.74 | 3.87 | 15 |[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ESNet_x0_75_pretrained.pdparams) |[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ESNet_x0_75_infer.tar) |
| ESNet_x1_0 | 0.7392 | 0.9140 |13.67|8.71|5.97| 197.33 | 4.64 | 18 |[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ESNet_x1_0_pretrained.pdparams) |[Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ESNet_x1_0_infer.tar) |
<a name="6"></a>
## 6. SEResNeXt and Res2Net series <sup>[[7](#ref7)][[8](#ref8)][[9](#ref9)]</sup>
The accuracy and speed indicators of the SEResNeXt and Res2Net series models are shown in the following table. For more information about the models of this series, please refer to: [SEResNeXt and Res2Net series model documents](../models/SEResNext_and_Res2Net_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|---------------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| Res2Net50_<br>26w_4s | 0.7933 | 0.9457 | 3.52 | 6.23 | 9.30 | 4.28 | 25.76 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_26w_4s_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Res2Net50_26w_4s_infer.tar) |
| Res2Net50_vd_<br>26w_4s | 0.7975 | 0.9491 | 3.59 | 6.35 | 9.50 | 4.52 | 25.78 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_vd_26w_4s_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Res2Net50_vd_26w_4s_infer.tar) |
| Res2Net50_<br>14w_8s | 0.7946 | 0.9470 | 4.39 | 7.21 | 10.38 | 4.20 | 25.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net50_14w_8s_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Res2Net50_14w_8s_infer.tar) |
| Res2Net101_vd_<br>26w_4s | 0.8064 | 0.9522 | 6.34 | 11.02 | 16.13 | 8.35 | 45.35 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net101_vd_26w_4s_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Res2Net101_vd_26w_4s_infer.tar) |
| Res2Net200_vd_<br>26w_4s | 0.8121 | 0.9571 | 11.45 | 19.77 | 28.81 | 15.77 | 76.44 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Res2Net200_vd_26w_4s_infer.tar) |
| Res2Net200_vd_<br>26w_4s_ssld | 0.8513 | 0.9742 | 11.45 | 19.77 | 28.81 | 15.77 | 76.44 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Res2Net200_vd_26w_4s_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Res2Net200_vd_26w_4s_ssld_infer.tar) |
| ResNeXt50_<br>32x4d | 0.7775 | 0.9382 | 5.07 | 8.49 | 12.02 | 4.26 | 25.10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_32x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt50_32x4d_infer.tar) |
| ResNeXt50_vd_<br>32x4d | 0.7956 | 0.9462 | 5.29 | 8.68 | 12.33 | 4.50 | 25.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_32x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt50_vd_32x4d_infer.tar) |
| ResNeXt50_<br>64x4d | 0.7843 | 0.9413 | 9.39 | 13.97 | 20.56 | 8.02 | 45.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_64x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt50_64x4d_infer.tar) |
| ResNeXt50_vd_<br>64x4d | 0.8012 | 0.9486 | 9.75 | 14.14 | 20.84 | 8.26 | 45.31 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt50_vd_64x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt50_vd_64x4d_infer.tar) |
| ResNeXt101_<br>32x4d | 0.7865 | 0.9419 | 11.34 | 16.78 | 22.80 | 8.01 | 44.32 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt101_32x4d_infer.tar) |
| ResNeXt101_vd_<br>32x4d | 0.8033 | 0.9512 | 11.36 | 17.01 | 23.07 | 8.25 | 44.33 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_32x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt101_vd_32x4d_infer.tar) |
| ResNeXt101_<br>64x4d | 0.7835 | 0.9452 | 21.57 | 28.08 | 39.49 | 15.52 | 83.66 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_64x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt101_64x4d_infer.tar) |
| ResNeXt101_vd_<br>64x4d | 0.8078 | 0.9520 | 21.57 | 28.22 | 39.70 | 15.76 | 83.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_vd_64x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt101_vd_64x4d_infer.tar) |
| ResNeXt152_<br>32x4d | 0.7898 | 0.9433 | 17.14 | 25.11 | 33.79 | 11.76 | 60.15 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_32x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt152_32x4d_infer.tar) |
| ResNeXt152_vd_<br>32x4d | 0.8072 | 0.9520 | 16.99 | 25.29 | 33.85 | 12.01 | 60.17 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_32x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt152_vd_32x4d_infer.tar) |
| ResNeXt152_<br>64x4d | 0.7951 | 0.9471 | 33.07 | 42.05 | 59.13 | 23.03 | 115.27 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_64x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt152_64x4d_infer.tar) |
| ResNeXt152_vd_<br>64x4d | 0.8108 | 0.9534 | 33.30 | 42.41 | 59.42 | 23.27 | 115.29 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt152_vd_64x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt152_vd_64x4d_infer.tar) |
| SE_ResNet18_vd | 0.7333 | 0.9138 | 1.48 | 2.70 | 4.32 | 2.07 | 11.81 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet18_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SE_ResNet18_vd_infer.tar) |
| SE_ResNet34_vd | 0.7651 | 0.9320 | 2.42 | 3.69 | 6.29 | 3.93 | 22.00 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet34_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SE_ResNet34_vd_infer.tar) |
| SE_ResNet50_vd | 0.7952 | 0.9475 | 3.11 | 5.99 | 9.34 | 4.36 | 28.16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNet50_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SE_ResNet50_vd_infer.tar) |
| SE_ResNeXt50_<br>32x4d | 0.7844 | 0.9396 | 6.39 | 11.01 | 14.94 | 4.27 | 27.63 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_32x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SE_ResNeXt50_32x4d_infer.tar) |
| SE_ResNeXt50_vd_<br>32x4d | 0.8024 | 0.9489 | 7.04 | 11.57 | 16.01 | 5.64 | 27.76 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt50_vd_32x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SE_ResNeXt50_vd_32x4d_infer.tar) |
| SE_ResNeXt101_<br>32x4d | 0.7939 | 0.9443 | 13.31 | 21.85 | 28.77 | 8.03 | 49.09 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SE_ResNeXt101_32x4d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SE_ResNeXt101_32x4d_infer.tar) |
| SENet154_vd | 0.8140 | 0.9548 | 34.83 | 51.22 | 69.74 | 24.45 | 122.03 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SENet154_vd_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SENet154_vd_infer.tar) |
<a name="7"></a>
## 7. DPN and DenseNet series <sup>[[14](#ref14)][[15](#ref15)]</sup>
The accuracy and speed indicators of the DPN and DenseNet series models are shown in the following table. For more information about the models of this series, please refer to: [DPN and DenseNet series model documents](../models/DPN_DenseNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|-------------|-----------|-----------|-----------------------|----------------------|----------|-----------|--------------------------------------------------------------------------------------|-------------|-------------|
| DenseNet121 | 0.7566 | 0.9258 | 3.40 | 6.94 | 9.17 | 2.87 | 8.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet121_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DenseNet121_infer.tar) |
| DenseNet161 | 0.7857 | 0.9414 | 7.06 | 14.37 | 19.55 | 7.79 | 28.90 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet161_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DenseNet161_infer.tar) |
| DenseNet169 | 0.7681 | 0.9331 | 5.00 | 10.29 | 12.84 | 3.40 | 14.31 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet169_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DenseNet169_infer.tar) |
| DenseNet201 | 0.7763 | 0.9366 | 6.38 | 13.72 | 17.17 | 4.34 | 20.24 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet201_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DenseNet201_infer.tar) |
| DenseNet264 | 0.7796 | 0.9385 | 9.34 | 20.95 | 25.41 | 5.82 | 33.74 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DenseNet264_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DenseNet264_infer.tar) |
| DPN68 | 0.7678 | 0.9343 | 8.18 | 11.40 | 14.82 | 2.35 | 12.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN68_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DPN68_infer.tar) |
| DPN92 | 0.7985 | 0.9480 | 12.48 | 20.04 | 25.10 | 6.54 | 37.79 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN92_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DPN92_infer.tar) |
| DPN98 | 0.8059 | 0.9510 | 14.70 | 25.55 | 35.12 | 11.728 | 61.74 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN98_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DPN98_infer.tar) |
| DPN107 | 0.8089 | 0.9532 | 19.46 | 35.62 | 50.22 | 18.38 | 87.13 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN107_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DPN107_infer.tar) |
| DPN131 | 0.8070 | 0.9514 | 19.64 | 34.60 | 47.42 | 16.09 | 79.48 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DPN131_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DPN131_infer.tar) |
<a name="8"></a>
## 8. HRNet series <sup>[[13](#ref13)]</sup>
The accuracy and speed indicators of the HRNet series models are shown in the following table. For more information about the models of this series, please refer to: [HRNet series model documents](../models/HRNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|-------------|-----------|-----------|------------------|------------------|----------|-----------|--------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| HRNet_W18_C | 0.7692 | 0.9339 | 6.66 | 8.94 | 11.95 | 4.32 | 21.35 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W18_C_infer.tar) |
| HRNet_W18_C_ssld | 0.81162 | 0.95804 | 6.66 | 8.94 | 11.95 | 4.32 | 21.35 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W18_C_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W18_C_ssld_infer.tar) |
| HRNet_W30_C | 0.7804 | 0.9402 | 8.61 | 11.40 | 15.23 | 8.15 | 37.78 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W30_C_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W30_C_infer.tar) |
| HRNet_W32_C | 0.7828 | 0.9424 | 8.54 | 11.58 | 15.57 | 8.97 | 41.30 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W32_C_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W32_C_infer.tar) |
| HRNet_W40_C | 0.7877 | 0.9447 | 9.83 | 15.02 | 20.92 | 12.74 | 57.64 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W40_C_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W40_C_infer.tar) |
| HRNet_W44_C | 0.7900 | 0.9451 | 10.62 | 16.18 | 25.92 | 14.94 | 67.16 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W44_C_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W44_C_infer.tar) |
| HRNet_W48_C | 0.7895 | 0.9442 | 11.07 | 17.06 | 27.28 | 17.34 | 77.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W48_C_infer.tar) |
| HRNet_W48_C_ssld | 0.8363 | 0.9682 | 11.07 | 17.06 | 27.28 | 17.34 | 77.57 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W48_C_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W48_C_ssld_infer.tar) |
| HRNet_W64_C | 0.7930 | 0.9461 | 13.82 | 21.15 | 35.51 | 28.97 | 128.18 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/HRNet_W64_C_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HRNet_W64_C_infer.tar) |
| SE_HRNet_W64_C_ssld | 0.8475 | 0.9726 | 17.11 | 26.87 | 43.24 | 29.00 | 129.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/SE_HRNet_W64_C_ssld_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SE_HRNet_W64_C_ssld_infer.tar) |
<a name="9"></a>
## 9. Inception series <sup>[[10](#ref10)][[11](#ref11)][[12](#ref12)][[26](#ref26)]</sup>
The accuracy and speed indicators of the Inception series models are shown in the following table. For more information about this series of models, please refer to: [Inception series model documents](../models/Inception_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|--------------------|-----------|-----------|-----------------------|----------------------|----------|-----------|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
| GoogLeNet | 0.7070 | 0.8966 | 1.41 | 3.25 | 5.00 | 1.44 | 11.54 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/GoogLeNet_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/GoogLeNet_infer.tar) |
| Xception41 | 0.7930 | 0.9453 | 3.58 | 8.76 | 16.61 | 8.57 | 23.02 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Xception41_infer.tar) |
| Xception41_deeplab | 0.7955 | 0.9438 | 3.81 | 9.16 | 17.20 | 9.28 | 27.08 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception41_deeplab_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Xception41_deeplab_infer.tar) |
| Xception65 | 0.8100 | 0.9549 | 5.45 | 12.78 | 24.53 | 13.25 | 36.04 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Xception65_infer.tar) |
| Xception65_deeplab | 0.8032 | 0.9449 | 5.65 | 13.08 | 24.61 | 13.96 | 40.10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception65_deeplab_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Xception65_deeplab_infer.tar) |
| Xception71 | 0.8111 | 0.9545 | 6.19 | 15.34 | 29.21 | 16.21 | 37.86 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Xception71_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Xception71_infer.tar) |
| InceptionV3 | 0.7914 | 0.9459 | 4.78 | 8.53 | 12.28 | 5.73 | 23.87 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/InceptionV3_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/InceptionV3_infer.tar) |
| InceptionV4 | 0.8077 | 0.9526 | 8.93 | 15.17 | 21.56 | 12.29 | 42.74 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/InceptionV4_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/InceptionV4_infer.tar) |
<a name="10"></a>
## 10. EfficientNet and ResNeXt101_wsl series <sup>[[16](#ref16)][[17](#ref17)]</sup>
The accuracy and speed indicators of the EfficientNet and ResNeXt101_wsl series models are shown in the following table. For more information about this series of models, please refer to: [EfficientNet and ResNeXt101_wsl series model documents](../models/EfficientNet_and_ResNeXt101_wsl_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|---------------------------|-----------|-----------|------------------|------------------|----------|-----------|----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| ResNeXt101_<br>32x8d_wsl | 0.8255 | 0.9674 | 13.55 | 23.39 | 36.18 | 16.48 | 88.99 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x8d_wsl_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt101_32x8d_wsl_infer.tar) |
| ResNeXt101_<br>32x16d_wsl | 0.8424 | 0.9726 | 21.96 | 38.35 | 63.29 | 36.26 | 194.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x16d_wsl_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt101_32x16d_wsl_infer.tar) |
| ResNeXt101_<br>32x32d_wsl | 0.8497 | 0.9759 | 37.28 | 76.50 | 121.56 | 87.28 | 469.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x32d_wsl_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt101_32x32d_wsl_infer.tar) |
| ResNeXt101_<br>32x48d_wsl | 0.8537 | 0.9769 | 55.07 | 124.39 | 205.01 | 153.57 | 829.26 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeXt101_32x48d_wsl_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeXt101_32x48d_wsl_infer.tar) |
| Fix_ResNeXt101_<br>32x48d_wsl | 0.8626 | 0.9797 | 55.01 | 122.63 | 204.66 | 313.41 | 829.26 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/Fix_ResNeXt101_32x48d_wsl_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/Fix_ResNeXt101_32x48d_wsl_infer.tar) |
| EfficientNetB0 | 0.7738 | 0.9331 | 1.96 | 3.71 | 5.56 | 0.40 | 5.33 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB0_infer.tar) |
| EfficientNetB1 | 0.7915 | 0.9441 | 2.88 | 5.40 | 7.63 | 0.71 | 7.86 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB1_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB1_infer.tar) |
| EfficientNetB2 | 0.7985 | 0.9474 | 3.26 | 6.20 | 9.17 | 1.02 | 9.18 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB2_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB2_infer.tar) |
| EfficientNetB3 | 0.8115 | 0.9541 | 4.52 | 8.85 | 13.54 | 1.88 | 12.324 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB3_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB3_infer.tar) |
| EfficientNetB4 | 0.8285 | 0.9623 | 6.78 | 15.47 | 24.95 | 4.51 | 19.47 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB4_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB4_infer.tar) |
| EfficientNetB5 | 0.8362 | 0.9672 | 10.97 | 27.24 | 45.93 | 10.51 | 30.56 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB5_infer.tar) |
| EfficientNetB6 | 0.8400 | 0.9688 | 17.09 | 43.32 | 76.90 | 19.47 | 43.27 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB6_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB6_infer.tar) |
| EfficientNetB7 | 0.8430 | 0.9689 | 25.91 | 71.23 | 128.20 | 38.45 | 66.66 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB7_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB7_infer.tar) |
| EfficientNetB0_<br>small | 0.7580 | 0.9258 | 1.24 | 2.59 | 3.92 | 0.40 | 4.69 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/EfficientNetB0_small_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/EfficientNetB0_small_infer.tar) |
<a name="11"></a>
## 11. ResNeSt and RegNet series <sup>[[24](#ref24)][[25](#ref25)]</sup>
The accuracy and speed indicators of the ResNeSt and RegNet series models are shown in the following table. For more information about the models of this series, please refer to: [ResNeSt and RegNet series model documents](../models/ResNeSt_RegNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| ResNeSt50_<br>fast_1s1x64d | 0.8035 | 0.9528 | 2.73 | 5.33 | 8.24 | 4.36 | 26.27 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_fast_1s1x64d_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeSt50_fast_1s1x64d_infer.tar) |
| ResNeSt50 | 0.8083 | 0.9542 | 7.36 | 10.23 | 13.84 | 5.40 | 27.54 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNeSt50_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNeSt50_infer.tar) |
| RegNetX_4GF | 0.785 | 0.9416 | 6.46 | 8.48 | 11.45 | 4.00 | 22.23 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RegNetX_4GF_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RegNetX_4GF_infer.tar) |
<a name="12"></a>
## 12. ViT and DeiT series <sup>[[31](#ref31)][[32](#ref32)]</sup>
The accuracy and speed indicators of ViT (Vision Transformer) and DeiT (Data-efficient Image Transformers) series models are shown in the following table. For more information about this series of models, please refer to: [ViT_and_DeiT series model documents](../models/ViT_and_DeiT_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|------------------------|------------------------|
| ViT_small_<br/>patch16_224 | 0.7769 | 0.9342 | 3.71 | 9.05 | 16.72 | 9.41 | 48.60 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_small_patch16_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_small_patch16_224_infer.tar) |
| ViT_base_<br/>patch16_224 | 0.8195 | 0.9617 | 6.12 | 14.84 | 28.51 | 16.85 | 86.42 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_224_infer.tar) |
| ViT_base_<br/>patch16_384 | 0.8414 | 0.9717 | 14.15 | 48.38 | 95.06 | 49.35 | 86.42 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch16_384_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch16_384_infer.tar) |
| ViT_base_<br/>patch32_384 | 0.8176 | 0.9613 | 4.94 | 13.43 | 24.08 | 12.66 | 88.19 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_base_patch32_384_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_base_patch32_384_infer.tar) |
| ViT_large_<br/>patch16_224 | 0.8323 | 0.9650 | 15.53 | 49.50 | 94.09 | 59.65 | 304.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch16_224_infer.tar) |
|ViT_large_<br/>patch16_384| 0.8513 | 0.9736 | 39.51 | 152.46 | 304.06 | 174.70 | 304.12 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch16_384_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch16_384_infer.tar) |
|ViT_large_<br/>patch32_384| 0.8153 | 0.9608 | 11.44 | 36.09 | 70.63 | 44.24 | 306.48 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ViT_large_patch32_384_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ViT_large_patch32_384_infer.tar) |
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|------------------------|------------------------|------------------------|------------------------|
| DeiT_tiny_<br>patch16_224 | 0.718 | 0.910 | 3.61 | 3.94 | 6.10 | 1.07 | 5.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_patch16_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_patch16_224_infer.tar) |
| DeiT_small_<br>patch16_224 | 0.796 | 0.949 | 3.61 | 6.24 | 10.49 | 4.24 | 21.97 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_patch16_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_patch16_224_infer.tar) |
| DeiT_base_<br>patch16_224 | 0.817 | 0.957 | 6.13 | 14.87 | 28.50 | 16.85 | 86.42 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_224_infer.tar) |
| DeiT_base_<br>patch16_384 | 0.830 | 0.962 | 14.12 | 48.80 | 97.60 | 49.35 | 86.42 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_patch16_384_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_patch16_384_infer.tar) |
| DeiT_tiny_<br>distilled_patch16_224 | 0.741 | 0.918 | 3.51 | 4.05 | 6.03 | 1.08 | 5.87 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_tiny_distilled_patch16_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_tiny_distilled_patch16_224_infer.tar) |
| DeiT_small_<br>distilled_patch16_224 | 0.809 | 0.953 | 3.70 | 6.20 | 10.53 | 4.26 | 22.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_small_distilled_patch16_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_small_distilled_patch16_224_infer.tar) |
| DeiT_base_<br>distilled_patch16_224 | 0.831 | 0.964 | 6.17 | 14.94 | 28.58 | 16.93 | 87.18 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_224_infer.tar) |
| DeiT_base_<br>distilled_patch16_384 | 0.851 | 0.973 | 14.12 | 48.76 | 97.09 | 49.43 | 87.18 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DeiT_base_distilled_patch16_384_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DeiT_base_distilled_patch16_384_infer.tar) |
<a name="13"></a>
## 13. RepVGG series <sup>[[36](#ref36)]</sup>
The accuracy and speed indicators of RepVGG series models are shown in the following table. For more introduction, please refer to: [RepVGG series model documents](../models/RepVGG_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| RepVGG_A0 | 0.7131 | 0.9016 | | | | 1.36 | 8.31 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_A0_infer.tar) |
| RepVGG_A1 | 0.7380 | 0.9146 | | | | 2.37 | 12.79 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A1_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_A1_infer.tar) |
| RepVGG_A2 | 0.7571 | 0.9264 | | | | 5.12 | 25.50 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_A2_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_A2_infer.tar) |
| RepVGG_B0 | 0.7450 | 0.9213 | | | | 3.06 | 14.34 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_B0_infer.tar) |
| RepVGG_B1 | 0.7773 | 0.9385 | | | | 11.82 | 51.83 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_B1_infer.tar) |
| RepVGG_B2 | 0.7813 | 0.9410 | | | | 18.38 | 80.32 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_B2_infer.tar) |
| RepVGG_B1g2 | 0.7732 | 0.9359 | | | | 8.82 | 41.36 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g2_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_B1g2_infer.tar) |
| RepVGG_B1g4 | 0.7675 | 0.9335 | | | | 7.31 | 36.13 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B1g4_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_B1g4_infer.tar) |
| RepVGG_B2g4 | 0.7881 | 0.9448 | | | | 11.34 | 55.78 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B2g4_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_B2g4_infer.tar) |
| RepVGG_B3g4 | 0.7965 | 0.9485 | | | | 16.07 | 75.63 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RepVGG_B3g4_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RepVGG_B3g4_infer.tar) |
<a name="14"></a>
## 14. MixNet series <sup>[[29](#ref29)]</sup>
The accuracy and speed indicators of the MixNet series models are shown in the following table. For more introduction, please refer to: [MixNet series model documents](../models/MixNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(M) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| -------- | --------- | --------- | ---------------- | ---------------- | ----------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| MixNet_S | 0.7628 | 0.9299 | 2.31 | 3.63 | 5.20 | 252.977 | 4.167 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_S_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MixNet_S_infer.tar) |
| MixNet_M | 0.7767 | 0.9364 | 2.84 | 4.60 | 6.62 | 357.119 | 5.065 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_M_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MixNet_M_infer.tar) |
| MixNet_L | 0.7860 | 0.9437 | 3.16 | 5.55 | 8.03 | 579.017 | 7.384 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MixNet_L_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MixNet_L_infer.tar) |
<a name="15"></a>
## 15. ReXNet series <sup>[[30](#ref30)]</sup>
The accuracy and speed indicators of ReXNet series models are shown in the following table. For more introduction, please refer to: [ReXNet series model documents](../models/ReXNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| ReXNet_1_0 | 0.7746 | 0.9370 | 3.08 | 4.15 | 5.49 | 0.415 | 4.84 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ReXNet_1_0_infer.tar) |
| ReXNet_1_3 | 0.7913 | 0.9464 | 3.54 | 4.87 | 6.54 | 0.68 | 7.61 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_3_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ReXNet_1_3_infer.tar) |
| ReXNet_1_5 | 0.8006 | 0.9512 | 3.68 | 5.31 | 7.38 | 0.90 | 9.79 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_1_5_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ReXNet_1_5_infer.tar) |
| ReXNet_2_0 | 0.8122 | 0.9536 | 4.30 | 6.54 | 9.19 | 1.56 | 16.45 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_2_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ReXNet_2_0_infer.tar) |
| ReXNet_3_0 | 0.8209 | 0.9612 | 5.74 | 9.49 | 13.62 | 3.44 | 34.83 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ReXNet_3_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ReXNet_3_0_infer.tar) |
<a name="16"></a>
## 16. SwinTransformer series <sup>[[27](#ref27)]</sup>
The accuracy and speed indicators of SwinTransformer series models are shown in the following table. For more introduction, please refer to: [SwinTransformer series model documents](../models/SwinTransformer_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | 6.59 | 9.68 | 16.32 | 4.35 | 28.26 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_tiny_patch4_window7_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_tiny_patch4_window7_224_infer.tar) |
| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | 12.54 | 17.07 | 28.08 | 8.51 | 49.56 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_small_patch4_window7_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_small_patch4_window7_224_infer.tar) |
| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | 13.37 | 23.53 | 39.11 | 15.13 | 87.70 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) |
| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | 19.52 | 64.56 | 123.30 | 44.45 | 87.70 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) |
| SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8487 | 0.9746 | 13.53 | 23.46 | 39.13 | 15.13 | 87.70 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window7_224_22kto1k_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window7_224_infer.tar) |
| SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8642 | 0.9807 | 19.65 | 64.72 | 123.42 | 44.45 | 87.70 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_base_patch4_window12_384_22kto1k_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_base_patch4_window12_384_infer.tar) |
| SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8596 | 0.9783 | 15.74 | 38.57 | 71.49 | 34.02 | 196.43 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window7_224_22kto1k_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window7_224_22kto1k_infer.tar) |
| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8719 | 0.9823 | 32.61 | 116.59 | 223.23 | 99.97 | 196.43 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SwinTransformer_large_patch4_window12_384_22kto1k_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SwinTransformer_large_patch4_window12_384_22kto1k_infer.tar) |
[1]It is pre-trained based on the ImageNet22k dataset, and then transferred and learned from the ImageNet1k dataset.
<a name="17"></a>
## 17. LeViT series <sup>[[33](#ref33)]</sup>
The accuracy and speed indicators of LeViT series models are shown in the following table. For more introduction, please refer to: [LeViT series model documents](../models/LeViT_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(M) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| LeViT_128S | 0.7598 | 0.9269 | | | | 281 | 7.42 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128S_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_128S_infer.tar) |
| LeViT_128 | 0.7810 | 0.9371 | | | | 365 | 8.87 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_128_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_128_infer.tar) |
| LeViT_192 | 0.7934 | 0.9446 | | | | 597 | 10.61 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_192_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_192_infer.tar) |
| LeViT_256 | 0.8085 | 0.9497 | | | | 1049 | 18.45 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_256_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_256_infer.tar) |
| LeViT_384 | 0.8191 | 0.9551 | | | | 2234 | 38.45 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/LeViT_384_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/LeViT_384_infer.tar) |
**Note**: The accuracy difference with Reference is due to the difference in data preprocessing and the use of no distilled head as output.
<a name="18"></a>
## 18. Twins series <sup>[[34](#ref34)]</sup>
The accuracy and speed indicators of Twins series models are shown in the following table. For more introduction, please refer to: [Twins series model documents](../models/Twins_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| pcpvt_small | 0.8082 | 0.9552 | 7.32 | 10.51 | 15.27 |3.67 | 24.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_small_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_small_infer.tar) |
| pcpvt_base | 0.8242 | 0.9619 | 12.20 | 16.22 | 23.16 | 6.44 | 43.83 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_base_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_base_infer.tar) |
| pcpvt_large | 0.8273 | 0.9650 | 16.47 | 22.90 | 32.73 | 9.50 | 60.99 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/pcpvt_large_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/pcpvt_large_infer.tar) |
| alt_gvt_small | 0.8140 | 0.9546 | 6.94 | 9.01 | 12.27 |2.81 | 24.06 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_small_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_small_infer.tar) |
| alt_gvt_base | 0.8294 | 0.9621 | 9.37 | 15.02 | 24.54 | 8.34 | 56.07 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_base_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_base_infer.tar) |
| alt_gvt_large | 0.8331 | 0.9642 | 11.76 | 22.08 | 35.12 | 14.81 | 99.27 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/alt_gvt_large_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/alt_gvt_large_infer.tar) |
**Note**: The accuracy difference with Reference is due to the difference in data preprocessing.
<a name="19"></a>
## 19. HarDNet series <sup>[[37](#ref37)]</sup>
The accuracy and speed indicators of HarDNet series models are shown in the following table. For more introduction, please refer to: [HarDNet series model documents](../models/HarDNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| HarDNet39_ds | 0.7133 |0.8998 | 1.40 | 2.30 | 3.33 | 0.44 | 3.51 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet39_ds_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HarDNet39_ds_infer.tar) |
| HarDNet68_ds |0.7362 | 0.9152 | 2.26 | 3.34 | 5.06 | 0.79 | 4.20 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet68_ds_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HarDNet68_ds_infer.tar) |
| HarDNet68| 0.7546 | 0.9265 | 3.58 | 8.53 | 11.58 | 4.26 | 17.58 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet68_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HarDNet68_infer.tar) |
| HarDNet85 | 0.7744 | 0.9355 | 6.24 | 14.85 | 20.57 | 9.09 | 36.69 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/HarDNet85_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/HarDNet85_infer.tar) |
<a name="20"></a>
## 20. DLA series <sup>[[38](#ref38)]</sup>
The accuracy and speed indicators of DLA series models are shown in the following table. For more introduction, please refer to: [DLA series model documents](../models/DLA_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| DLA102 | 0.7893 |0.9452 | 4.95 | 8.08 | 12.40 | 7.19 | 33.34 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA102_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DLA102_infer.tar) |
| DLA102x2 |0.7885 | 0.9445 | 19.58 | 23.97 | 31.37 | 9.34 | 41.42 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA102x2_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DLA102x2_infer.tar) |
| DLA102x| 0.781 | 0.9400 | 11.12 | 15.60 | 20.37 | 5.89 | 26.40 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA102x_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DLA102x_infer.tar) |
| DLA169 | 0.7809 | 0.9409 | 7.70 | 12.25 | 18.90 | 11.59 | 53.50 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA169_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DLA169_infer.tar) |
| DLA34 | 0.7603 | 0.9298 | 1.83 | 3.37 | 5.98 | 3.07 | 15.76 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA34_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DLA34_infer.tar) |
| DLA46_c |0.6321 | 0.853 | 1.06 | 2.08 | 3.23 | 0.54 | 1.31 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA46_c_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DLA46_c_infer.tar) |
| DLA60 | 0.7610 | 0.9292 | 2.78 | 5.36 | 8.29 | 4.26 | 22.08 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA60_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DLA60_infer.tar) |
| DLA60x_c | 0.6645 | 0.8754 | 1.79 | 3.68 | 5.19 | 0.59 | 1.33 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA60x_c_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DLA60x_c_infer.tar) |
| DLA60x | 0.7753 | 0.9378 | 5.98 | 9.24 | 12.52 | 3.54 | 17.41 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DLA60x_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DLA60x_infer.tar) |
<a name="21"></a>
## 21. RedNet series <sup>[[39](#ref39)]</sup>
The accuracy and speed indicators of RedNet series models are shown in the following table. For more introduction, please refer to: [RedNet series model documents](../models/RedNet_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| RedNet26 | 0.7595 |0.9319 | 4.45 | 15.16 | 29.03 | 1.69 | 9.26 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet26_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RedNet26_infer.tar) |
| RedNet38 |0.7747 | 0.9356 | 6.24 | 21.39 | 41.26 | 2.14 | 12.43 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet38_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RedNet38_infer.tar) |
| RedNet50| 0.7833 | 0.9417 | 8.04 | 27.71 | 53.73 | 2.61 | 15.60 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet50_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RedNet50_infer.tar) |
| RedNet101 | 0.7894 | 0.9436 | 13.07 | 44.12 | 83.28 | 4.59 | 25.76 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet101_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RedNet101_infer.tar) |
| RedNet152 | 0.7917 | 0.9440 | 18.66 | 63.27 | 119.48 | 6.57 | 34.14 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/RedNet152_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/RedNet152_infer.tar) |
<a name="22"></a>
## 22. TNT series <sup>[[35](#ref35)]</sup>
The accuracy and speed indicators of TNT series models are shown in the following table. For more introduction, please refer to: [TNT series model documents](../models/TNT_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
| ---------- | --------- | --------- | ---------------- | ---------------- | -------- | --------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| TNT_small | 0.8121 |0.9563 | | | 4.83 | 23.68 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/TNT_small_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/TNT_small_infer.tar) |
**Note**: Both `mean` and `std` in the data preprocessing part of the TNT model `NormalizeImage` are 0.5.
<a name="23"></a>
## 23. Other models
The accuracy and speed indicators of AlexNet <sup>[[18](#ref18)]</sup>, SqueezeNet series <sup>[[19](#ref19)]</sup>, VGG series <sup>[[20](#ref20)]</sup>, DarkNet53 <sup>[[21](#ref21)]</sup> and other models are shown in the following table. For more information, please refer to: [Other model documents](../models/Others_en.md).
| Model | Top-1 Acc | Top-5 Acc | time(ms)<br>bs=1 | time(ms)<br>bs=4 | time(ms)<br/>bs=8 | FLOPs(G) | Params(M) | Pretrained Model Download Address | Inference Model Download Address |
|------------------------|-----------|-----------|------------------|------------------|----------|-----------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
| AlexNet | 0.567 | 0.792 | 0.81 | 1.50 | 2.33 | 0.71 | 61.10 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/AlexNet_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/AlexNet_infer.tar) |
| SqueezeNet1_0 | 0.596 | 0.817 | 0.68 | 1.64 | 2.62 | 0.78 | 1.25 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_0_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SqueezeNet1_0_infer.tar) |
| SqueezeNet1_1 | 0.601 | 0.819 | 0.62 | 1.30 | 2.09 | 0.35 | 1.24 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/SqueezeNet1_1_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/SqueezeNet1_1_infer.tar) |
| VGG11 | 0.693 | 0.891 | 1.72 | 4.15 | 7.24 | 7.61 | 132.86 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/VGG11_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/VGG11_infer.tar) |
| VGG13 | 0.700 | 0.894 | 2.02 | 5.28 | 9.54 | 11.31 | 133.05 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/VGG13_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/VGG13_infer.tar) |
| VGG16 | 0.720 | 0.907 | 2.48 | 6.79 | 12.33 | 15.470 | 138.35 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/VGG16_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/VGG16_infer.tar) |
| VGG19 | 0.726 | 0.909 | 2.93 | 8.28 | 15.21 | 19.63 | 143.66 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/VGG19_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/VGG19_infer.tar) |
| DarkNet53 | 0.780 | 0.941 | 2.79 | 6.42 | 10.89 | 9.31 | 41.65 | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/DarkNet53_pretrained.pdparams) | [Download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/DarkNet53_infer.tar) |
<a name='reference'></a>
## Reference
<a name="ref1">[1]</a> He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
<a name="ref2">[2]</a> He T, Zhang Z, Zhang H, et al. Bag of tricks for image classification with convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 558-567.
<a name="ref3">[3]</a> Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1314-1324.
<a name="ref4">[4]</a> Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4510-4520.
<a name="ref5">[5]</a> Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
<a name="ref6">[6]</a> Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 116-131.
<a name="ref7">[7]</a> Xie S, Girshick R, Dollár P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1492-1500.
<a name="ref8">[8]</a> Hu J, Shen L, Sun G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141.
<a name="ref9">[9]</a> Gao S, Cheng M M, Zhao K, et al. Res2net: A new multi-scale backbone architecture[J]. IEEE transactions on pattern analysis and machine intelligence, 2019.
<a name="ref10">[10]</a> Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
<a name="ref11">[11]</a> Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-first AAAI conference on artificial intelligence. 2017.
<a name="ref12">[12]</a> Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258.
<a name="ref13">[13]</a> Wang J, Sun K, Cheng T, et al. Deep high-resolution representation learning for visual recognition[J]. arXiv preprint arXiv:1908.07919, 2019.
<a name="ref14">[14]</a> Chen Y, Li J, Xiao H, et al. Dual path networks[C]//Advances in neural information processing systems. 2017: 4467-4475.
<a name="ref15">[15]</a> Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700-4708.
<a name="ref16">[16]</a> Tan M, Le Q V. Efficientnet: Rethinking model scaling for convolutional neural networks[J]. arXiv preprint arXiv:1905.11946, 2019.
<a name="ref17">[17]</a> Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196.
<a name="ref18">[18]</a> Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105.
<a name="ref19">[19]</a> Iandola F N, Han S, Moskewicz M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arXiv preprint arXiv:1602.07360, 2016.
<a name="ref20">[20]</a> Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
<a name="ref21">[21]</a> Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.
<a name="ref22">[22]</a> Ding X, Guo Y, Ding G, et al. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2019: 1911-1920.
<a name="ref23">[23]</a> Han K, Wang Y, Tian Q, et al. GhostNet: More features from cheap operations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 1580-1589.
<a name="ref24">[24]</a> Zhang H, Wu C, Zhang Z, et al. Resnest: Split-attention networks[J]. arXiv preprint arXiv:2004.08955, 2020.
<a name="ref25">[25]</a> Radosavovic I, Kosaraju R P, Girshick R, et al. Designing network design spaces[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10428-10436.
<a name="ref26">[26]</a> C.Szegedy, V.Vanhoucke, S.Ioffe, J.Shlens, and Z.Wojna. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567, 2015.
<a name="ref27">[27]</a> Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.
<a name="ref28">[28]</a>Cheng Cui, Tingquan Gao, Shengyu Wei, Yuning Du, Ruoyu Guo, Shuilong Dong, Bin Lu, Ying Zhou, Xueying Lv, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma. PP-LCNet: A Lightweight CPU Convolutional Neural Network.
<a name="ref29">[29]</a>Mingxing Tan, Quoc V. Le. MixConv: Mixed Depthwise Convolutional Kernels.
<a name="ref30">[30]</a>Dongyoon Han, Sangdoo Yun, Byeongho Heo, YoungJoon Yoo. Rethinking Channel Dimensions for Efficient Model Design.
<a name="ref31">[31]</a>Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. AN IMAGE IS WORTH 16X16 WORDS:
TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE.
<a name="ref32">[32]</a>Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Herve Jegou. Training data-efficient image transformers & distillation through attention.
<a name="ref33">[33]</a>Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Herve Jegou, Matthijs Douze. LeViT: a Vision Transformer in ConvNets Clothing for Faster Inference.
<a name="ref34">[34]</a>Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen. Twins: Revisiting the Design of Spatial Attention in Vision Transformers.
<a name="ref35">[35]</a>Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, Yunhe Wang. Transformer in Transformer.
<a name="ref36">[36]</a>Xiaohan Ding, Xiangyu Zhang, Ningning Ma, Jungong Han, Guiguang Ding, Jian Sun. RepVGG: Making VGG-style ConvNets Great Again.
<a name="ref37">[37]</a>Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, Youn-Long Lin. HarDNet: A Low Memory Traffic Network.
<a name="ref38">[38]</a>Fisher Yu, Dequan Wang, Evan Shelhamer, Trevor Darrell. Deep Layer Aggregation.
<a name="ref39">[39]</a>Duo Lim Jie Hu, Changhu Wang, Xiangtai Li, Qi She, Lei Zhu, Tong Zhang, Qifeng Chen. Involution: Inverting the Inherence of Convolution for Visual Recognition.

@ -0,0 +1,78 @@
# Image Classification Task Introduction
## Catalogue
- [1. Dataset Introduction](#1)
- [1.1 ImageNet-1k](#1.1)
- [1.2 CIFAR-10/CIFAR-100](#1.2)
- [2. Image Classification Process](#2)
- [2.1 Data and its Preprocessing](#2.1)
- [2.2 Prepare the model](#2.2)
- [2.3 Train the model](#2.3)
- [2.4 Evaluate the model](#2.4)
- [3. Main Algorithms Introduction](#3)
Image Classification is a fundamental task that classifies the image by semantic information and assigns it to a specific label. Image Classification is the foundation of Computer Vision tasks, such as object detection, image segmentation, object tracking, and behavior analysis. Image Classification enjoys comprehensive applications, including face recognition and smart video analysis in the security and protection field, traffic scenario recognition in the traffic field, image retrieval and electronic photo album classification in the internet industry, and image recognition in the medical industry.
Generally speaking, Image Classification attempts to fully describe the whole image by feature engineering and assigns labels by a classifier. Hence, how to extract the features of images is the essential part. Before we have deep learning, the most adopted classification method is the Bag of Words model. However, Image Classification based on deep learning can learn the hierarchical feature description by supervised and unsupervised learning, replacing the manual image feature selection. Recently, Convolution Neural Network (CNN) in deep learning gives an awesome performance in the image field. It uses pixel information as the input to get all the information to the maximum extent. Additionally, since the model uses convolution to extract features, the classification result is the output. Thus, this end-to-end method performs well and is widespread.
Image Classification is a basic but important field in computer vision, whose research results have a lasting impact on the development of computer vision and even deep learning. Image classification has many sub-fields, such as multi-label image classification and fine-grained image classification. Here we only brief on the single-label image classification.
<a name="1"></a>
## 1. Dataset Introduction
<a name="1.1"></a>
### 1.1 ImageNet-1k
The ImageNet project is a large-scale visual database for the research of visual object recognition software. More than 14 million images have been annotated manually to point out objects in the picture, and at least 1 million images are provided with borders. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories. The training set contains 1281167 image data, and the validation set contains 50,000 image data. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained.
<a name="1.2"></a>
### 1.2 CIFAR-10/CIFAR-100
The CIFAR-10 data set consists of 60,000 color images of 10 categories with an image resolution of 32x32, and each category has 6000 images, including 5000 in the training set and 1000 in the validation set. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. The CIFAR-100 dataset is an extension of CIFAR-10 and consists of 60,000 color images of 100 classes with an image resolution of 32x32, and each class has 600 images, including 500 in the training set and 100 in the validation set. Researchers can try different algorithms quickly due to their small scale. These two data sets are also commonly used for testing the quality of models in image classification.
<a name="2"></a>
## 2. Image Classification Process
The prepared training data is correspondingly preprocessed and then passed through the image classification model. The output of the model and the real label are used in a cross-entropy loss function which describes the convergence direction of the model. An image classification model can be obtained by repeatedly traversing all the image data input models, conducting the corresponding gradient descent for the final loss function through some optimizers, returning the gradient information to the model, and updating the weight of the model.
<a name="2.1"></a>
### 2.1 Data and its Preprocessing
The quality and quantity of data often determine the performance of a model. In the field of image classification, data includes images and labels. In most cases, labeled data is scarce to an extent that hard to saturate the model. In order to enable the model to learn more image features, a lot of image transformation or data augmentation is required before the image enters the model, so as to ensure the diversity of input data, hence better generalization capabilities of the model. PaddleClas provides standard image transformation for training ImageNet-1k and 8 data augmentation methods. For related codes, please refer to [Data Preprocess](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/data/preprocess)and the configuration file to [Data Augmentation Configuration File](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/ImageNet/DataAugment).
<a name="2.2"></a>
### 2.2 Prepare the Model
After the data is settled, the model often determines the upper limit of the final accuracy. In the field of image classification, classic models emerge in endlessly. PaddleClas provides 36 series, or a total of 164 ImageNet pre-trained models. For specific accuracy, speed and other indicators, please refer to [Backbone Network Introduction](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/models).
<a name="2.3"></a>
### 2.3 Train the Model
After preparing the data and model, you can start training the model and updating the parameters of the model. After many iterations, a trained model can finally be obtained for image classification tasks. The training process of image classification requires a lot of experience and involves the setting of many hyperparameters. PaddleClas provides a series of [training tuning methods](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/models/Tricks_en.md), which can help you quickly obtain a high-precision model.
<a name="2.4"></a>
### 2.4 Evaluate the Model
After a model is trained, the evaluation results of the model on the validation set can determine the performance of the model. The evaluation index is generally Top1-Acc or Top5-Acc, and the higher the index, the better the model performance.
<a name="3"></a>
## 3. Main Algorithms Introduction
- LeNet: Yan LeCun et al. first applied convolutional neural networks to image classification tasks in the 1990s, and creatively proposed LeNet, which achieved great success in handwritten digit recognition tasks.
- AlexNet: Alex Krizhevsky et al. proposed AlexNet in 2012 and applied it to ImageNet, and won the 2012 ImageNet classification competition. Since then, a deep learning boom is created.
- VGG: Simonyan and Zisserman put forward the VGG network structure in 2014. This network structure uses a smaller convolution kernel to stack the entire network, achieving better performance in ImageNet classification and providing new ideas for the subsequent network structure design.
- GoogLeNet: Christian Szegedy et al. presented GoogLeNet in 2014. This network uses a multi-branch structure and a global average pooling layer (GAP). While maintaining the accuracy of the model, the amount of model storage and calculation witnesses a drastic decrease. The network won the 2014 ImageNet classification competition.
- ResNet: Kaiming He et al. delivered ResNet in 2015, which deepened the depth of the network by introducing a residual module, reducing the recognition error rate of ImageNet classification to 3.6%, which exceeded the recognition accuracy of normal human eyes for the first time.
- DenseNet: Huang Gao et al. proposed DenseNet in 2017. The network designed a denser connected block and achieved higher performance with a smaller amount of parameters.
- EfficientNet: Mingxing Tan et al. introduced EfficientNet in 2019. This network balances the width of the network, the depth of the network, and the resolution of the input image. With the same FLOPS and parameters, it reaches the state-of-the-art results.
For more algorithm introduction, please refer to [Algorithm Introduction](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/models).

@ -0,0 +1,12 @@
algorithm_introduction
================================
.. toctree::
:maxdepth: 2
image_classification_en.md
metric_learning_en.md
knowledge_distillation_en.md
model_prune_quantization_en.md
ImageNet_models_en.md
DataAugmentation_en.md

@ -0,0 +1,94 @@
# Knowledge Distillation
---
## Content
* [1. Introduction of model compression methods](#1)
* [2. Application of knowledge distillation](#2)
* [3. Overview of knowledge distillation methods](#3)
* [3.1 Response based distillation](#3.1)
* [3.2 Feature based distillation](#3.2)
* [3.3 Relation based distillation](#3.3)
* [4. Reference](#4)
<a name='1'></a>
## 1. Introduction of model compression methods
In recent years, deep neural networks have been proved to be an extremely effective method for solving problems in the fields of computer vision and natural language processing. A suitable neural network architecture might performs better than traditional algorithms mostly.
When the amount of data is large enough, increasing the model parameters with a reasonable method can significantly improve the model performance, but this brings about the problem of a sharp increase of the model complexity. It costs more for larger models.
Parameter redundancy exists in deep neural networks generally. At present, there are several mainstream methods to compress the model and reduce parameters. Such as pruning, quantization, knowledge distillation, etc. Knowledge distillation refers to the use of a teacher model to guide the student model to learn specific tasks to ensure that the small model obtains relatively large performance, and even has comparable performance with the large model [1].
Currently, knowledge distillation methods can be roughly divided into the following three types.
* Response based distillation: Output of student model is guided by the teacher model for
* Feature based distillation: Inner feature map of student model is guided by the teacher model.
* Relation based distillation: For different samples, the teacher model and the student model are used to calculate the correlation of the feature map between the samples, the final goal is to make sure that correlation matrix of student model and the teacher model are as consistent as possible.
<a name='2'></a>
## 2. Application of knowledge distillation
Knowledge distillation algorithm is widely used in lightweight tasks. For tasks that need to meet specific accuracy, by using the knowledge distillation method, we can achieve the required accuracy with a smaller model, thereby reducing model deployment cost.
What's more, for the same model structure, pre-trained models obtained by knowledge distillation often performs better, and these pre-trained models can also improve performance of the downstream tasks. For example, a pre-trained image classification model with higher accuracy can also help other tasks obtain significant accuracy gains such as target detection, image segmentation, OCR, and video classification.
<a name='3'></a>
## 3. Overview of knowledge distillation methods
<a name='3.1'></a>
### 3.1 Response based distillation
Knowledge distillation algorithm is firstly proposed by Hinton, which is called KD. In addition to base cross entropy loss, KL divergence loss between output of student model and teacher model is also added into the total training loss. It's noted that a larger teacher model is needed to guide the training process of the student model.
PaddleClas proposed a simple but useful knowledge distillation algorithm canlled SSLD [6], Labels are not needed for SSLD, so unlabeled data can also be used for training. Accuracy of 15 models has more than 3% improvement using SSLD.
Teacher model is needed for the above-mentioned distillation method to guide the student model training process. Deep Mutual Learning (DML) is then proposed [7], for which two models with same architecture learn from each other to obtain higher accuracy. Compared with KD and other knowledge distillation algorithms that rely on large teacher models, DML is free of dependence on large teacher models. The distillation training process is simpler.
<a name='3.2'></a>
### 3.2 Feature based distillation
Heo et al. proposed OverHaul [8], which calculates the feature map distance between the student model and the teacher model, as distillation loss. Here, feature map alignment of the student model and the teacher model is used to ensure that the feature maps' distance can be calculated.
Feature based distillation can also be integrated with the response based knowledge distillation algorithm in Chapter 3.1, which means both the inner feature map and output of the student model are guided during the training process. For the DML method, this integration process is simpler, because the alignment process is not needed since the two models' architectures are absolutely same. This integration process is used in the PP-OCRv2 system, which ultimately greatly improves the accuracy of the OCR text recognition model.
<a name='3.3'></a>
### 3.3 Relation based distillation
The papers in chapters `3.1` and `3.2` mainly consider the inner feature map or final output of the student model and the teacher model. These knowledge distillation algorithms only focus on the output for single sample, but do not consider the output relationship between different samples.
Park et al. proposed RKD [10], a relationship-based knowledge distillation algorithm. In RKD, the relationship between different samples is further considered, and two loss functions are used, which are the second-order distance loss (distance-wise) and the third-order angle loss (angle-wise). For the final distillation loss, KD loss and RKD loss are considered at the same time. The final accuracy is better than the accuracy of the model obtained just using KD loss.
<a name='4'></a>
## 4. Reference
[1] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
[2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018.
[3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
[4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
[5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.
[6] Cui C, Guo R, Du Y, et al. Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones[J]. arXiv preprint arXiv:2103.05959, 2021.
[7] Zhang Y, Xiang T, Hospedales T M, et al. Deep mutual learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4320-4328.
[8] Heo B, Kim J, Yun S, et al. A comprehensive overhaul of feature distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1921-1930.
[9] Du Y, Li C, Guo R, et al. PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System[J]. arXiv preprint arXiv:2109.03144, 2021.
[10] Park W, Kim D, Lu Y, et al. Relational knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3967-3976.

@ -0,0 +1,40 @@
# Metric Learning
## Catalogue
- [1.Introduction](#1)
- [2.Applications](#2)
- [3.Algorithms](#3)
- [3.1 Classification based](#3.1)
- [3.2 Pairwise based](#3.2)
<a name="1"></a>
## 1.Introduction
Measuring the distance between data is a common practice in machine learning. Generally speaking, Euclidean Distance, Inner Product, or Cosine Similarity are all available to calculate measurable data. However, the same operation can hardly be replicated on unstructured data, such as calculating the compatibility between a video and a piece of music. Despite the difficulty in performing the aforementioned vector operation directly due to varied data formats, priori knowledge tells that ED(laugh_video, laugh_music) < ED(laugh_video, blue_music). And how to effectively characterize this "distance"? This is exactly the focus of Metric Learning.
Metric learning, known as Distance Metric Learning, is to automatically construct a task-specific metric function based on training data in the form of machine learning. As shown in the figure below, the goal of Metric learning is to learn a transformation function (either linear or nonlinear) L that maps data points from the original vector space to a new one in which similar points are closer together and non-similar points are further apart, making the metric more task-appropriate. And Deep Metric Learning fits the transformation function by adopting a deep neural network. ![example](../../images/ml_illustration.jpg)
<a name="2"></a>
## 2.Applications
Metric Learning technologies are widely applied in real life, such as Face Recognition, Person ReID, Image Retrieval, Fine-grained classification, etc. With the growing prevalence of deep learning in industrial practice, Deep Metric Learning (DML) emerges as the current research direction.
Normally, DML consists of three parts: a feature extraction network for map embedding, a sampling strategy to combine samples in a mini-batch into multiple sub-sets, and a loss function to compute the loss on each sub-set. Please refer to the figure below: ![image](../../images/ml_pipeline.jpg)
<a name="3"></a>
## 3.Algorithms
Two learning paradigms are adopted in Metric Learning:
<a name="3.1"></a>
### 3.1 Classification based:
This refers to methods based on classification labels. They learn the effective feature representation by classifying each sample into the correct category and require the participation of the explicit labels of each sample in the Loss calculation during the learning process. Common algorithms include [L2-Softmax](https://arxiv.org/abs/1703.09507), [Large-margin Softmax](https://arxiv.org/abs/1612.02295), [Angular Softmax](https://arxiv.org/pdf/1704.08063.pdf), [NormFace](https://arxiv.org/abs/1704.06369), [AM-Softmax](https://arxiv.org/abs/1801.05599), [CosFace](https://arxiv.org/abs/1801.09414), [ArcFace](https://arxiv.org/abs/1801.07698), etc. These methods are also called proxy-based, because what they optimize is essentially the similarity between a sample and a set of proxies.
<a name="3.2"></a>
### 3.2 Pairwise based:
This refers to the learning paradigm based on paired samples. It takes sample pairs as input and obtains an effective feature representation by directly learning the similarity between these pairs. Common algorithms include [Contrastive loss](http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf), [ Triplet loss](https://arxiv.org/abs/1503.03832), [Lifted-Structure loss](https://arxiv.org/abs/1511.06452), [N-pair loss](https://), [Multi-Similarity loss](https://arxiv.org/pdf/1904.06627.pdf), etc.
[CircleLoss](https://arxiv.org/abs/2002.10857), released in 2020, unifies the two learning paradigms from a fresh perspective, prompting researchers and practitioners' further reflection on Metric Learning.

@ -0,0 +1,52 @@
# Algorithms of Model Pruning and Quantization
Deep learning limits the deployment of corresponding models in some scenarios and devices due to its computational complexity or parameter redundancy, thus requiring model compression, optimization acceleration. Model compression algorithms can effectively decrease parameter redundancy, thus reducing storage footprint, communication bandwidth, and computational complexity, which are conducive to the application and deployment of models of deep learning. Among them, model quantization and pruning enjoy great popularity. In PaddleClas, the following two algorithms should be mainly applied.
- Quantization: PACT
- Pruning: FPGM
See [PaddeSlim](https://github.com/PaddlePaddle/PaddleSlim/) for detailed parameters.
## Catlogue
- [1. PACT](#1)
- [2. FPGM](#2)
<a name='1'></a>
## 1. PACT
The model quantization comprises two main parts, the quantization of the Weight and the Activation. Simultaneous quantization of the two parts is necessary to maximize the computational efficiency gain. The weight can be distributed as compactly as possible by means of network regularization to reduce outliers and uneven distribution, while there is a lack of effective means for activation.
**PACT (PArameterized Clipping acTivation)** is a new quantization method that minimizes the loss of accuracy, or even achieves great accuracy by removing some outliers before the quantization of activation. The method was proposed when the author found that "the quantized activation differed significantly from the full accuracy results when the weight quantization is adopted". The author also found that the quantization of activation can cause a great error (as a result of RELU, the range of activation is infinite compared to the weight which is basically within 0 to 1), so the activation function **clipped RELU** was introduced. The clipping ceiling, i.e., $α$, is a learnable parameter, which ensures that each layer can learn a different quantization range through training and minimizes the rounding error caused by quantization. The schematic diagram of quantization is shown below. **PACT** solves the problem by continuously trimming the activation range so that the activation distribution is narrowed, thus reducing the quantization mapping loss. It can acquire a more reasonable quantization scale and cut the quantization loss by clipping the activation, thus reducing the outliers in the activation distribution.
![img](../../images/algorithm_introduction/quantization.jpg)
The quantization formula of **PACT** is as follows:
![img](../../images/algorithm_introduction/quantization_formula.png)
It is shown that PACT is about adopting the above quantization as a substitute for the *ReLU* function to clip the part greater than zero with a threshold of $a$. However, the above formula is further improved in *PaddleSlim* as follows:
![img](../../images/algorithm_introduction/quantization_formula_slim.png)
After the above improvement, *PACT* preprocessing is inserted between the activation and the OP (convolution, full connection, etc.) to be quantized, which not only clips the distribution greater than 0 but also perform the same for the part less than 0, so as to better obtain the range to be quantized and minimize the quantization loss. At the same time, the clipping threshold is a trainable parameter, which can be detected automatically and reasonably by the model during the quantization training, thus further lowering the quantization accuracy loss.
For specific algorithm parameters, please refer to [Introduction to Parameters](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/api_cn/dygraph/quanter/qat.rst#qat) in PaddleSlim.
<a name='2'></a>
## 2. FPGM
Model pruning is an essential practice to reduce the model size and improve inference efficiency. In previous articles on network pruning, the norm of the network filter is generally adopted to measure its importance, **the smaller the norm value, the less important the filter is** and the more significant it will be to clip it from the network. **FPGM** believes that the previous approach relies on the following two points:
- The deviation of the filter's norm should be large so that important and unimportant filters can be well separated
- The norm of the unimportant filter should be small enough
Based on this, **FPGM** takes advantage of the geometric center property of the filter. Since filters near the center can be expressed by others, they can be eliminated, thus avoiding the above two pruning conditions. As a result, the pruning is conducted in consideration of the redundancy of information instead of a small norm. The following figure shows how the **FPGM** differs from the previous method, see [paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/He_Filter_Pruning_via_Geometric_Median_) for more details.
![img](../../images/algorithm_introduction/fpgm.png)
For specific algorithm parameters, please refer to [Introduction to Parameters](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/api_cn/dygraph/pruners/fpgm_filter_pruner.rst#fpgmfilterpruner) in PaddleSlim.

@ -0,0 +1,3 @@
# Release Notes
* 2020.04.14: first commit

@ -0,0 +1,65 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))
import sphinx_rtd_theme
from recommonmark.parser import CommonMarkParser
# -- Project information -----------------------------------------------------
project = 'PaddleClas-en'
copyright = '2022, PaddleClas'
author = 'PaddleClas'
# The full version, including alpha/beta/rc tags
release = '2.3'
# -- General configuration ---------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
source_parsers = {
'.md': CommonMarkParser,
}
source_suffix = ['.rst', '.md']
extensions = [
'recommonmark',
'sphinx_markdown_tables'
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The root document.
root_doc = 'doc_en'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
# 更改文档配色
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']

@ -0,0 +1,130 @@
# Image Classification Datasets
This document elaborates on the dataset format adopted by PaddleClas for image classification tasks, as well as other common datasets in this field.
------
## Catalogue
- [1.Dataset Format](#1)
- [2.Common Datasets for Image Classification](#2)
- [2.1 ImageNet1k](#2.1)
- [2.2 Flowers102](#2.2)
- [2.3 CIFAR10 / CIFAR100](#2.3)
- [2.4 MNIST](#2.4)
- [2.5 NUS-WIDE](#2.5)
<a name="1"></a>
## 1.Dataset Format
PaddleClas adopts `txt` files to assign the training and test sets. Taking the `ImageNet1k` dataset as an example, where `train_list.txt` and `val_list.txt` have the following formats:
```
# Separate the image path and annotation with "space" for each line
# train_list.txt has the following format
train/n01440764/n01440764_10026.JPEG 0
...
# val_list.txt has the following format
val/ILSVRC2012_val_00000001.JPEG 65
...
```
<a name="2"></a>
## 2.Common Datasets for Image Classification
Here we present a compilation of commonly used image classification datasets, which is continuously updated and expects your supplement.
<a name="2.1"></a>
### 2.1 ImageNet1k
[ImageNet](https://image-net.org/) is a large visual database for visual target recognition research with over 14 million manually labeled images. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories with 1281167 images for the training set and 50000 for the validation set. Since 2010, ImageNet began to hold an annual image classification competition, namely, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) with ImageNet-1k as its specified dataset. To date, ImageNet-1k has become one of the most significant contributors to the development of computer vision, based on which numerous initial models of downstream computer vision tasks are trained.
| Dataset | Size of Training Set | Size of Test Set | Number of Category | Note |
| ------------------------------------------------------------ | -------------------- | ---------------- | ------------------ | ---- |
| [ImageNet1k](http://www.image-net.org/challenges/LSVRC/2012/) | 1.2M | 50k | 1000 | |
After downloading the data from official sources, organize it in the following format to train with the ImageNet1k dataset in PaddleClas.
```
PaddleClas/dataset/ILSVRC2012/
|_ train/
| |_ n01440764
| | |_ n01440764_10026.JPEG
| | |_ ...
| |_ ...
| |
| |_ n15075141
| |_ ...
| |_ n15075141_9993.JPEG
|_ val/
| |_ ILSVRC2012_val_00000001.JPEG
| |_ ...
| |_ ILSVRC2012_val_00050000.JPEG
|_ train_list.txt
|_ val_list.txt
```
<a name="2.2"></a>
### 2.2 Flowers102
| Dataset | Size of Training Set | Size of Test Set | Number of Category | Note |
| ------------------------------------------------------------ | -------------------- | ---------------- | ------------------ | ---- |
| [flowers102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | 1k | 6k | 102 | |
Unzip the downloaded data to see the following directory.
```
jpg/
setid.mat
imagelabels.mat
```
Place the files above under `PaddleClas/dataset/flowers102/` .
Run `generate_flowers102_list.py` to generate `train_list.txt` and `val_list.txt`:
```
python generate_flowers102_list.py jpg train > train_list.txt
python generate_flowers102_list.py jpg valid > val_list.txt
```
Structure the data as follows
```
PaddleClas/dataset/flowers102/
|_ jpg/
| |_ image_03601.jpg
| |_ ...
| |_ image_02355.jpg
|_ train_list.txt
|_ val_list.txt
```
<a name="2.3"></a>
### 2.3 CIFAR10 / CIFAR100
The CIFAR-10 dataset comprises 60,000 color images of 10 classes with 32x32 image resolution, each with 6,000 images including 5,000 images in the training set and 1,000 images in the validation set. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. The CIFAR-100 dataset is an extension of CIFAR-10 and consists of 60,000 color images of 100 classes with 32x32 image resolution, each with 600 images including 500 images in the training set and 100 images in the validation set.
Websitehttp://www.cs.toronto.edu/~kriz/cifar.html
<a name="2.4"></a>
### 2.4 MNIST
MMNIST is a renowned dataset for handwritten digit recognition and is used as an introductory sample for deep learning in many sources. It contains 60,000 images, 50,000 for the training set and 10,000 for the validation set, with a size of 28 * 28.
Websitehttp://yann.lecun.com/exdb/mnist/
<a name="2.5"></a>
### 2.5 NUS-WIDE
NUS-WIDE is a multi-category dataset. It contains 269,648 images and 81 categories with each image being labeled as one or more of the 81 categories.
Websitehttps://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html

@ -0,0 +1,8 @@
data_preparation
================================
.. toctree::
:maxdepth: 2
recognition_dataset_en.md
classification_dataset_en.md

@ -0,0 +1,141 @@
# Image Recognition Datasets
This document elaborates on the dataset format adopted by PaddleClas for image recognition tasks, as well as other common datasets in this field.
------
## Catalogue
- [1.Dataset Format](#1)
- [2.Common Datasets for Image Recognition](#2)
- [2.1 General Datasets](#2.1)
- [2.2 Vertical Class Datasets](#2.2)
- [2.2.1 Animation Character Recognition](#2.2.1)
- [2.2.2 Product Recognition](#2.2.2)
- [2.2.3 Logo Recognition](#2.2.3)
- [2.2.4 Vehicle Recognition](#2.2.4)
<a name="1"></a>
## 1.Dataset Format
The dataset for the vector search, unlike those for classification tasks, is divided into the following three parts:
- Train dataset: Used to train the model to learn the image features involved.
- Gallery dataset: Used to provide the gallery data in the vector search task. It can either be the same as the train or query datasets or different, and when it is the same as the train dataset, the category system of the query dataset and train dataset should be the same.
- Query dataset: Used to test the performance of the model. It usually extracts features from each query image of the dataset, followed by distance matching with those in the gallery dataset to get the recognition results, based on which the metrics of the whole query dataset are calculated.
The above three datasets all adopt `txt` files for assignment. Taking the `CUB_200_2011` dataset as an example, the `train_list.txt` of the train dataset has the following format
```
# Use "space" as the separator
...
train/99/Ovenbird_0136_92859.jpg 99 2
...
train/99/Ovenbird_0128_93366.jpg 99 6
...
```
The `test_list.txt` of the query dataset (both gallery dataset and query dataset in`CUB_200_2011`) has the following format
```
# Use "space" as the separator
...
test/200/Common_Yellowthroat_0126_190407.jpg 200 1
...
test/200/Common_Yellowthroat_0114_190501.jpg 200 6
...
```
Each row of data is separated by "space", and the three columns of data stand for the path, label information, and unique id of training data.
**Note**
1. When the gallery dataset and query dataset are the same, to remove the first retrieved data (the images themselves require no evaluation), each data should have its unique id (ensuring that each image has a different id, which can be represented by the row number) for subsequent evaluation of mAP, recall@1, and other metrics. The dataset of yaml configuration file is `VeriWild`.
2. When the gallery dataset and query dataset are different, there is no need to add a unique id. Both `query_list.txt` and `gallery_list.txt` contain two columns, which are the path and label information of the training data. The dataset of yaml configuration file is ` ImageNetDataset`.
<a name="2"></a>
## 2.Common Datasets for Image Recognition
Here we present a compilation of commonly used image recognition datasets, which is continuously updated and expects your supplement.
<a name="2.1"></a>
### 2.1 General Datasets
- SOP: The SOP dataset is a common product dataset in general recognition research and MetricLearning technology research, which contains 120,053 images of 22,634 products downloaded from eBay.com. There are 59,551 images of 11,318 in the training set and 60,502 images of 11,316 categories in the validation set.
Website: https://cvgl.stanford.edu/projects/lifted_struct/
- Cars196: The Cars dataset contains 16,185 images of 196 categories of cars. The data is classified into 8144 training images and 8041 query images, with each category split roughly in a 50-50 ratio. The classification is normally based on the manufacturing, model and year of the car, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe.
Website: https://ai.stanford.edu/~jkrause/cars/car_dataset.html
- CUB_200_2011: The CUB_200_2011 dataset is a fine-grained dataset proposed by the California Institute of Technology (Caltech) in 2010 and is currently the benchmark image dataset for fine-grained classification recognition research. There are 11788 bird images in this dataset with 200 subclasses, including 5994 images in the train dataset and 5794 images in the query dataset. Each image provides label information, the bounding box of the bird, the key part information of the bird, and the attribute of the bird. The dataset is shown in the figure below.
- In-shop Clothes: In-shop Clothes is one of the 4 subsets of the DeepFashion dataset. It is a seller show image dataset with multi-angle images of each product id being collected in the same folder. The dataset contains 7982 items with 52712 images, each with 463 attributes, Bbox, landmarks, and store descriptions.
Website http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
<a name="2.2"></a>
### 2.2 Vertical Class Datasets
<a name="2.2.1"></a>
#### 2.2.1 Animation Character Recognition
- iCartoonFace: iCartoonFace, developed by iQiyi (an online video platform), is the world's largest manual labeled detection and recognition dataset for cartoon characters, which contains more than 5013 cartoon characters and 389,678 high-quality live images. Compared with other datasets, it boasts features of large scale, high quality, rich diversity, and challenging difficulty, making it one of the most commonly used datasets to study cartoon character recognition.
- Website http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d
- Manga109: Manga109 is a dataset released in May 2020 for the study of cartoon character detection and recognition, which contains 21142 images and is officially banned from commercial use. Manga109-s, a subset of this dataset, is available for industrial use, mainly for tasks such as text detection, sketch line-based search, and character image generation.
Websitehttp://www.manga109.org/en/
- IIT-CFW: The IIF-CFW dataset contains a total of 8928 labeled cartoon portraits of celebrity characters, covering 100 characters with varying numbers of portraits for each. In addition, it also provides 1000 real face photos (10 real portraits for 100 public figures). This dataset can be employed to study both animation character recognition and cross-modal search tasks.
Website http://cvit.iiit.ac.in/research/projects/cvit-projects/cartoonfaces
<a name="2.2.2"></a>
#### 2.2.2 Product Recognition
- AliProduct: The AliProduct dataset is the largest open source product dataset. As an SKU-level image classification dataset, it contains 50,000 categories and 3 million images, ranking the first in both aspects in the industry. This dataset covers a large number of household goods, food, etc. Due to its lack of manual annotation, the data is messy and unevenly distributed with many similar product images.
Website: https://retailvisionworkshop.github.io/recognition_challenge_2020/
- Product-10k: Products-10k dataset has all its images from Jingdong Mall, covering 10,000 frequently purchased SKUs that are organized into a hierarchy. In total, there are nearly 190,000 images. In the real application scenario, the distribution of image volume is uneven. All images are manually checked/labeled by a team of production experts.
Websitehttps://www.kaggle.com/c/products-10k/data?select=train.csv
- DeepFashion-Inshop: The same as the common datasets In-shop Clothes.
<a name="2.2.3"></a>
### 2.2.3 Logo Recognition
- Logo-2K+: Logo-2K+ is a dataset exclusively for logo image recognition, which contains 10 major categories, 2341 minor categories, and 167,140 images.
Website https://github.com/msn199959/Logo-2k-plus-Dataset
- Tsinghua-Tencent 100K: This dataset is a large traffic sign benchmark dataset based on 100,000 Tencent Street View panoramas. 30,000 traffic sign instances included, it provides 100,000 images covering a wide range of illumination, and weather conditions. Each traffic sign in the benchmark test is labeled with the category, bounding box and pixel mask. A total of 222 categories (0 background + 221 traffic signs) are incorporated.
Website https://cg.cs.tsinghua.edu.cn/traffic-sign/
<a name="2.2.4"></a>
### 2.2.4 Vehicle Recognition
- CompCars: The images, 136,726 images of the whole car and 27,618 partial ones, are mainly from network and surveillance data. The network data contains 163 vehicle manufacturers and 1,716 vehicle models and includes the bounding box, viewing angle, and 5 attributes (maximum speed, displacement, number of doors, number of seats, and vehicle type). And the surveillance data comprises 50,000 front view images.
Website http://mmlab.ie.cuhk.edu.hk/datasets/comp_cars/
- BoxCars: The dataset contains a total of 21,250 vehicles, 63,750 images, 27 vehicle manufacturers, and 148 subcategories. All of them are derived from surveillance data.
Website https://github.com/JakubSochor/BoxCars
- PKU-VD Dataset: The dataset contains two large vehicle datasets (VD1 and VD2) that capture images from real-world unrestricted scenes in two cities. VD1 is obtained from high-resolution traffic cameras, while images in VD2 are acquired from surveillance videos. The authors have performed vehicle detection on the raw data to ensure that each image contains only one vehicle. Due to privacy constraints, all the license numbers have been obscured with black overlays. All images are captured from the front view, and diverse attribute annotations are provided for each image in the dataset, including identification numbers, accurate vehicle models, and colors. VD1 originally contained 1097649 images, 1232 vehicle models, and 11 vehicle colors, and remains 846358 images and 141756 vehicles after removing images with multiple vehicles inside and those taken from the rear of the vehicle. VD2 contains 807260 images, 79763 vehicles, 1112 vehicle models, and 11 vehicle colors.
Website https://pkuml.org/resources/pku-vds.html

@ -0,0 +1,23 @@
Welcome to PaddleClas
================================
.. toctree::
:maxdepth: 1
introduction/index
installation/index
quick_start/index
image_recognition_pipeline/index
data_preparation/index
models_training/index
inference_deployment/index
models/index
algorithm_introduction/index
advanced_tutorials/index
others/index
faq_series/index

@ -0,0 +1,390 @@
# Image Classification FAQ Summary - 2020 Season 1
## Catalogue
- [1. Issue 1](#1)(2020.11.03)
- [2. Issue 2](#2)(2020.11.11)
- [3. Issue 3](#3)(2020.11.18)
- [4. Issue 4](#4)(2020.12.07)
- [5. Issue 5](#5)(2020.12.17)
- [6. Issue 6](#6)(2020.12.30)
<a name="1"></a>
## Issue 1
### Q1.1: What can PaddleClas be used for?
**A**PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios.
It provides the whole process of model training, evaluation, inference, and deployment based on image classification to facilitate users' efficient learning. Specifically, PaddleClas contains the following features.
- PaddleClas provides 36 families of classification network structures (ResNet, ResNet_vd, MobileNetV3, Res2Net, HRNet, etc.) and training configurations, 175 pre-trained models, and performance evaluation and inference for free choice and application.
- PaddleClas provides a variety of inference deployment solutions such as TensorRT inference, python inference, c++ inference, Paddle-Lite inference deployment, PaddleServing, PaddleHub, etc., to facilitate inference deployment in multiple environments.
- PaddleClas provides a simple SSLD knowledge distillation scheme, based on which the recognition accuracy of distillation models register a general improvement of more than 3%.
- PaddleClas provides 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with a detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment.
- PaddleClas supports CPU/GPU-based adoption in Windows/Linux/MacOS environments.
### Q1.2: What is the ResNet series model? What are they? Why are they so popular on the server side?
**A**: ResNet takes the lead to introduce the residual structure, and construct the ResNet network by stacking multiple residual structures. Experiments show that the use of residual blocks can effectively improve convergence speed and accuracy. In PaddleClas, ResNet has such structures containing 18, 34, 50, 101, 152, and 200 layers in order. These models, proposed in 2015, has been validated in different application scenarios, such as classification, detection, segmentation, etc., and has long been optimized by the industry and acquired obvious advantages in terms of speed and accuracy, let alone its well support for the inference of TensorRT and FP16. Therefore, it is recommended to adopt the ResNet series model. Considering their large storage footprint, they are often used on the server side. For more information about ResNet models, please refer to the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385).
### Q1.3: What's the difference between the structure of ResNet_vd, ResNet and ResNet_vc?
**A**: The structure of ResNet_va to vd is shown in the figure below. ResNet was first proposed as va structure, in the left feature transformation path (Path A) of the downsampling residual module, the first 1x1 convolution is downsampled, which leads to information loss (the kernel size of the convolution is 1, stride is 2, some features in the input feature graph are not involved in the calculation of convolution). In the vb structure, the downsampling step is adjusted from the first 1x1 convolution at the beginning to the 3x3 convolution in the middle, thus avoiding the loss of information, and the default ResNet model in PaddleClas is ResNet_vb. The vc structure turns the initial 7x7 convolution into 3 3x3 convolutions with almost the same computation and storage size and improved accuracy when the perceptual field remains unchanged. The vd structure is a modification of the feature path (Path B) on the right side of the downsampling residual module, replacing the downsampling with average pooling. This collection of improvements (va->vd), with little extra inference time, and combined with appropriate training strategies, such as label smoothing and mixup data augmentation, can improve the accuracy by up to 2.7%.
![](../../images/faq/ResNet_vabcd_structure.png)
### Q1.4 How to choose appropriate ResNet models for the actual scenario?
**A**:
Among the ResNet series model, the ResNet_vd model is recommended for it has a significant improvement in accuracy with almost constant inference speed compared to other models. When the batch size=4, the variation of inference time, FLOPs, Params and accuracy for different models on T4 GPU are demonstrated in the [ResNet and its vd series models](../models/ResNet_and_vd_en.md). If you want the smallest possible model storage or the fastest inference speed, please use ResNet18_vd model, and if you want to get the highest possible accuracy, we recommend the ResNet152_vd or ResNet200_vd models. For more information about the ResNet series model, please refer to [ResNet and its vd series models](../models/ResNet_and_vd_en.md)
- Variation of precision-inference speed
![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.png)
- Variation of precision-params
![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png)
- Variation of precision-flops
![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png)
### Q1.5 Is conv-bn-relu a fixed form in a block of the network?
**A**:
Before the advent of batch-norm, the mainstream convolutional neural networks were fixed in the form of conv-relu. At the moment, conv-bn-relu is the fixed form of blocks in most of the convolutional networks, which is a relatively robust design. Besides, the block in DenseNet is adopted in the form of bn-relu-conv, which is the same combination used in ResNet-V2. In MobileNetV2, the middle layer of some blocks adopts conv-bn instead of the relu activation function to avoid information loss.
### Q1.6 What's the difference between ResNet34 and ResNet50
**A**:
There are two different kinds of blocks in the ResNet series, basic-block and bottleneck-block, and the ResNet network is constructed by the stacking of such blocks. basic-block is a stack of two 3x3 convolutional kernels with shortcut, while bottleneck-block is a stack of a 1x1 convolutional kernel, 3x3 convolutional kernel, and 1x1 convolutional kernel with shortcut, so there are two layers in the former one and three in the latter. The number of blocks stacked in ResNet34 and ResNet50 is the same, but the types of stacking are basic-block and bottleneck-block, respectively.
### Q1.7 Do large convolution kernels necessarily lead to positive returns?
**A**:
Not really, increasing all the convolutional kernels in the network may not lead to performance improvement or even the opposite. In the paper [MixConv: Mixed Depthwise Convolutional Kernels](https://arxiv.org/abs/1907.09595), it is pointed out that increasing the size of the convolutional kernels within a certain range plays a positive role in the accuracy improvement, but a size beyond may lead to accuracy loss. Therefore, considering the size of the model and the computation, large convolutional kernels are generally abandoned to design the network. Also, there are experiments on large convolution kernels in the article [PP-LCNet](../models/PP-LCNet_en.md).
<a name="2"></a>
## Issue 2
### Q2.1: How does PaddleClas train its backbone?
**A**The process is as follows:
- First, create a new model structure file under the folder `ppcls/arch/backbone/model_zoo/`, i.e. your own backbone. You can refer to resnet.py for model construction;
- Then add your own backbone class in `ppcls/arch/backbone/__init__.py`;
- Next, configure the yaml file for training, here you can refer to `ppcls/configs/ImageNet/ResNet/ResNet50.yaml`;
- Now you can start the training.
### Q2.2: How to transfer the existing models and weights to your own classification tasks?
**A**: The process is as follows:
- First, a good pre-training model tends to be better transferred, so it is recommended to adopt a pre-training model with higher accuracy, for instance, series of industry-leading pre-training models provided by PaddleClas;
- Second, determine and train hyperparameters based on the size of the dataset to be transferred, which need to be debugged to find a local optimal value. If you have no relevant experience, it is recommended to start with the learning rate, which generally has a smaller dataset adopting a small learning rate, such as 0.001. In addition, the warmup strategy is suggested for the learning rate to avoid the weight damage of the pre-training model resulting from a large learning rate. During the transfer, the learning rate of different layers in the backbone can also be set, and it is often better to gradually reduce the learning rate from the head to the tail of the network. Data augmentation strategies can also be useful for small datasets, and PaddleClas offers 8 powerful data augmentation strategies for higher accuracy.
- After training, the above process can be iterated repeatedly until a local optimal value is found.
### Q2.3: Is the default parameter under configs in PaddleClas available for all datasets?
**A**:
The default parameter of the configuration file under `ppcls/configs/ImageNet/` in PaddleClas is the training parameter of ImageNet-1k, which is not suitable for all datasets, and the specific datasets need to be further debugged on this basis.
### Q2.4 The resolution varies for different models in PaddleClas, so what is the standard?
**A**:
PaddleClas strictly follows the resolution used by the authors of the paper. Since AlexNet in 2012, most of the convolutional neural networks trained on ImageNet have a resolution of 224x224, and Google adjusted the resolution to 299x299 when designing InceptionV3 to fit the network structure, which was the same one for later Xception and InceptionV4. In addition, in EfficeintNet, the authors analyze that different resolutions should be used for networks of different sizes, so does it in this series. In practical scenarios, it is recommended to adopt the default resolution, but networks with deeper layers or larger widths can also try larger ones.
### Q2.5 There are many ssld models available in PaddleClas, what is the value of their application?
**A**:
There are many ssld pre-training models available in PaddleClas, which obtain better pre-training weights by semi-supervised knowledge distillation, so that the accuracy can be improved by replacing the ssld pre-training models with higher accuracy in transfer tasks or downstream vision tasks without replacing the structure files. For example, in PaddleSeg, [HRNet](../models/HRNet_en.md) , with the weight of the ssld pre-training model, achieves much better accuracy than other same models in the industry; In PaddleDetection, [PP- YOLO](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/configs/ppyolo/README_cn.md)with ssld pre-training weights has further improvement in the already high baseline. The transfer of classification with ssld pre-training weights also yields impressive results, and the benefits of knowledge distillation for the transfer of classification task is detailed in [SSLD Distillation Strategy](../advanced_tutorials/knowledge_distillation_en.md)
<a name="3"></a>
## Issue 3
### Q3.1: What is the improvement of DenseNet model over ResNet? What are the features or application scenarios?
**A**:
DenseNet is designed with a more aggressive dense connectivity mechanism compared to ResNet, which further reduces the number of parameters by considering feature reuse and bypass settings, and mitigates the gradient dispersion problem to some extent. What's more, the model is easier to train and equipped with some regularization effect due to the introduction of dense connectivity. DenseNet is a good choice in image classification scenarios where the amount of data is limited. More information about DenseNet and this series can be found in [DenseNet Models](../models/DPN_DenseNet_en.md).
### Q3.2: What are the improvements of the DPN network over DenseNet?
**A**
The full name of DPN is Dual Path Networks, or Dual Channel Networks. It is a combination of DenseNet and ResNeXt, which demonstrates that DenseNet can extract new features from the previous layers, while ResNeXt is essentially reuse of features already extracted from the previous layers. The authors further analyze and find that ResNeXt has a high reuse rate for features but low redundancy, while DenseNet can create new features but has high redundancy. Combining the advantages of both structures, the DPN network is designed. Finally, the DPN network achieves better results than ResNeXt and DenseNet with the same FLOPS and number of parameters. More introduction and series models of DPN can be found in [DPN Models](../models/DPN_DenseNet_en.md).
### Q3.3: How to use multiple models for inference fusion?
**A**:
When adopting multiple models for inference, it is recommended to first export the pre-training model as an inference model to get rid of the dependence on the network structure definition, you can refer to [model export script](../../../tools/export_model.py) for model exporting, and then see [inference script for the inference model](../../../deploy/python/predict_cls.py), where you need to create multiple predictors according to the number of employed models.
### Q3.4: How to add your own data augmentation methods in PaddleClas?
**A**
- For single-image augmentation, you can refer to [Single-image based data augmentation script](../../../ppcls/data/preprocess/ops). Learning from the data operator ` ResizeImage ` or `CropImage` to create a new class, and then implement the corresponding augmentation method in `__call__`.
- For a batch image, you can refer to the [batch data-based data augmentation script](../../../ppcls/data/preprocess/batch_ops). Learning from the data operator `MixupOperator` or `CutmixOperator` to create a new class, and then implement the corresponding augmentation method in `__call__`.
## Q3.5: How to further accelerate the model training?
**A**
- You can adopt auto-mixed precision training, which can gain a significantly faster speed with almost zero precision loss. Take ResNet50 as an example, the configuration file of auto-mixed precision training in PaddleClas can be found at: [ResNet50_fp16.yml](../../../ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml). The main step is to add the following lines to the standard configuration file.
```
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
use_pure_fp16: &use_pure_fp16 True
```
- You can turn on dali to run the data preprocessing method on GPU. When the model is relatively small (reader accounts for a higher percentage of time consumption), an obviously faster speed can be obtained with dali on, which could be employed by adding `-o Global.use_dali=True` during training. You can refer to [dali installation tutorial](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html#nightly-builds) for more details.
<a name="4"></a>
## Issue 4
### Q4.1: How many types of model files are there in PaddlePaddle?
**A**:
- There are two types of model-related files saved in PaddlePaddle.
- One is the files used for *inference deployment*, including files with the suffixes ``pdiparams``, ``model``, where the ``pdiparams` " file stores model parameter information, and "`model`" file stores model network structure information. For inference deployment files, the `paddle.jit.save` and `paddle.jit.load` interfaces are used for saving and loading.
- Another one is used in the *training tuning* process, including files with the suffixes "`pdparams`" and "`pdopt`", where the "`pdparams` " file stores the model parameters information during training, and "`pdopt`" file stores the optimizer information during model training. For training tuning files, `paddle.save` and `paddle.load` interfaces are used for saving and loading.
- The inference deployment file enables you to build the model network structure and load the model parameters for inference, while the training tuning file allows you to load the model parameters and optimizer information for resuming the training process.
### Q4.2: What are the innovations of HRNet?
**A**:
- In the field of image classification, most neural networks are designed with the idea of extracting high-dimensional features of images. Specifically, the input image usually has a high spatial resolution, and through multi-layer convolution and pooling, a feature graph with lower spatial resolution but higher dimension can be gradually obtained and applied in scenarios such as classification.
- However, the authors of *HRNet* believe that the idea of gradually decreasing spatial resolution is not suitable for scenarios such as target detection (classification task of image region-level) and semantic segmentation (classification task of image pixel-level), because a lot of information is lost in this process and the final learned features can hardly represent the information of the original image at the high spatial resolution, while both the task of region-level and pixel-level classification are very sensitive to spatial accuracy.
- Therefore, the authors of *HRNet* propose the idea of paralleling feature graphs with different spatial resolutions, while in contrast, neural networks such as *VGG* cascade feature graphs with different spatial resolutions by different convolutional pooling layers. Moreover, *HRNet* connects feature graphs of equal depth and disparate spatial resolutions, so that the information can be fully exchanged. The specific network structure is shown in the figure below.
![](../../images/faq/HRNet.png)
### Q4.3: In HRNet, how are connections made between feature graphs with different spatial resolutions?
**A**:
- First, in *HRNet*, the *3 × 3* convolution with *stride* of *2* can be used to obtain a feature graph with low spatial resolution but higher dimension; and for those with low spatial resolution, the *1 × 1* convolution is first used to match the number of channels, and then the nearest neighbor interpolation is used for upsampling to obtain a feature graph with the same spatial resolution and number of channels as the high spatial resolution graph. And for the feature map with the same spatial resolution, the constant mapping can be performed directly. The details are shown in the following figure.
![](../../images/faq/HRNet_block.png)
### Q4.4: What does "SE" in the model stand for?
**A**:
- SE indicates that the model uses an SE structure, which is derived from the winning solution of the 2017 ImageNet classification competition *Squeeze-and-Excitation Networks (SENet)*, and can be migrated to any other networks. The *scale* vector dimension is the same as the number of channels in the feature map, and the value in each dimension of the learned *scale* vector indicates the enhancement or weakening of the feature channels in that dimension, so as to enhance the important feature channel and weaken the unimportant ones, thus making the extracted features more directional.
### Q4.5: How does the SE structure implemented?
![](../../images/faq/SE_structure.png)
**A**:
- The *SE* structure is shown in the figure above. First, *Ftr* represents the regular convolution operation, and *X* and *U* are the input and output feature maps of *Ftr*. After obtaining the feature map *U*, operate the *Fsq* and *Fex* to obtain the *scale* vector, which has a dimension of *C*, the same as the number of *U* channels, so it can be applied to *U* by multiplying, and then obtain *X~*.
- Specifically, *Fsq* is the *Global Average Pooling* operation, which is called *Squeeze* by *SENet* authors because it compresses *U* from *C × H × W* to *C × 1 × 1*, and then conduct the *Fex* operation on the output of *Fsq*.
- The *Fex* operation represents two full joins, which is referred to as *Excitation* by the authors. The first full join compresses the dimension of the vector from *1 × 1 × C* to *1 × 1 × C/r*, followed by the use of *RELU*, and then restores the dimension of the vector to *C* by the second full join. The purpose of this operation is to reduce the computation. The *SENet* authors conclude experimentally that a balance between gain and computational effort could be obtained when *r=16*.
- For *Fsq*, the key is to obtain the vector of *C* dimension, so it is not limited to the *Global Average Pooling*. The *SENet* authors believe that the final *scale* is applied to *U* separately by channel, so it is necessary to calculate the corresponding *scale* based on the information of the corresponding channel, so the simplest *Global Average Pooling* is adopted, and the final *scale* vector represents the distribution between different channels, ignoring the situation in the same channel.
- For *Fex*, its role is to find the distribution based on all the training data through training on each *mini batch*. Since our training is performed on *mini batches* and the *scale* based on all training data is the best, we can adopt the *Fex* to train on each *mini batch* to obtain a more reliable *scale*.
<a name="5"></a>
## Issue 5
### Q5.1 How to choose an optimizer?
**A**:
Since the emergence of deep learning, there has been a lot of research on optimizers, which aim to minimize the loss function to find the right weights for a given task. Currently, the main optimizers used in the industry are SGD, RMSProp, Adam, AdaDelt, etc. Among them, since the SGD optimizer with momentum is widely used in academia and industry (only for classification tasks), most of the models we published also adopt this optimizer to achieve gradient descent of the loss function. It has two disadvantages, one is the slow convergence speed, and the other is the reliance on experiences of the initial learning rate setting. However, if the initial learning rate is set properly with the sufficient number of iterations, the optimizer will also stand out among many other optimizers, obtaining higher accuracy on the validation set. Some optimizers with adaptive learning rates, such as Adam and RMSProp, tend to converge fast, but the final convergence accuracy will be slightly worse. If you pursue faster convergence speed, we recommend using these adaptive learning rate optimizers, and SGD optimizers with momentum for higher convergence accuracy. The specific information of the dataset is as follows:
- ImageNet-1k: It is recommended to use the SGD optimizer with momentum only.
- Other datasets (ImageNet-1k pre-training by default): When loading the pre-training model, you can consider an optimizer such as Adam (which may work better), but the SGD optimizer with momentum is definitely a good solution.
In addition, to further speed up the training, Lookahead optimizer is also a good choice. On ImageNet-1k, it can guarantee the same convergence accuracy at a faster rate, but the performance is less stable on some datasets and requires further tuning.
### Q5.2 How to set the initial learning rate and the learning rate decay strategy?
**A**: The choice of learning rate is often related to the optimizer as well as the data and the task. The learning rate determines how quickly the network weights are updated. The lower the learning rate, the slower the loss function will change. While using a low learning rate ensures that no local minimal values are missed, it also means that it takes longer to converge, especially if trapped in a plateau region.
Throughout the whole training process, we cannot adopt the same learning rate to update the weights, otherwise, the optimal point cannot be reached, So we need to adjust the learning rate during the training. In the initial stage of training, since the weights are in a random initialization state and the loss function decreases fast, a larger learning rate can be set. And in the later stage of training, since the weights are close to the optimal value, a larger learning rate cannot further find the optimal value, so a smaller learning rate needs is a better choice. As for the learning rate decay strategy, many researchers or practitioners use piecewise_decay (step_decay), which is a stepwise decay learning rate. In addition, there are also other methods proposed by researchers, such as polynomial_decay, exponential_ decay, cosine_decay, etc. Among them, cosine_decay requires no adjustment of hyperparameters and has higher robustness, thus emerging as the preferred learning rate decay method to improve model accuracy.
The learning rates of cosine_decay and piecewise_decay are shown in the following figure. It is easy to observe that cosine_decay keeps a large learning rate throughout the training, so it is slow in convergence, but its final effect is better than peicewise_decay.
![](../../images/models/lr_decay.jpeg)
In addition, it is also observed that only a few rounds in cosine_decay use a small learning rate, which affects the final accuracy. So it is recommended to iterate more rounds for better results.
Finally, when training a neural network with a large batch_size, it is recommended to use the warmup strategy, which, as the name implies, is a warm-up for the learning rate with no direct adoption of maximum learning rate at the beginning of training, but to train the network with a gradually increasing rate, and then decay the learning rate when it peaks. Experiments show that warmup can steadily improve the accuracy of the model when the batch_size is large. Specifically for the dataset. The specific information of the dataset is as follows:
- ImageNet-1k: The recommended batch-size is 256, the initial learning rate is 0.1, and the cosine-decay to decrease the learning rate.
- Other datasets (ImageNet-1k pre-training by default): the larger the dataset, the larger the initial learning rate, not exceeding 0.1 for the best (when the batch-size is 256); the smaller the dataset, the smaller the initial learning rate, when the dataset is small, the use of warmup will also bring some accuracy improvement, and cosine-decay is still recommended as the learning rate decay strategy.
### Q5.3 How to set the batch-size?
**A**:
Batch_size is an important hyperparameter in neural networks training, whose value determines how much data is fed into the neural network for training at a time. Previous researchers have experimentally found that when the value of batch_size is linearly related to the value of learning rate, the convergence accuracy is almost unaffected. When training ImageNet-1k data, most of the neural networks choose an initial learning rate of 0.1 and a batch_size of 256. The specific information of the dataset is as follows:
- ImageNet-1k: learning rate is set to 0.1*k, batch_size is set to 256*k.
- Other datasets (ImageNet-1k pre-training by default): can be set according to the actual situation (e.g. smaller learning rate), but when adjusting the learning rate or batch size, another value should be adjusted at the same time.
### Q5.4 What is weight_decay? How to set it?
**A**:
Overfitting is a common term in machine learning, which is simply understood as a model that performs well on training data but less satisfactory on test data. In image classification, there is also the problem of overfitting, and many regularization methods are proposed to avoid it, among which weight_decay is one of the widely used ways. When using SGD optimizer, weight_decay is equivalent to adding L2 regularization after the final loss function, which makes the weights of the network tend to choose smaller values, so eventually, the parameter values in the whole network tend to be more towards 0, and the generalization performance of the model is improved accordingly. In the implementation of major deep learning frameworks, this value means the coefficient before the L2 regularization, which is called L2Decay in the PaddlePaddle framework. The larger the coefficient is, the stronger the added regularization is, and the more the model tends to be underfitted. The specific information of the dataset is as follows:
- ImageNet-1k: Most networks set the value of this parameter to 1e-4, and in some smaller networks such as the MobileNet series network, the value is set between 1e-5 and 4e-5 to avoid the underfitting. The following table shows the accuracy of MobileNetV1_x0_25 on ImageNet-1k using different L2Decay. Since MobileNetV1_x0_25 is a relatively small network, an overly large L2Decay will tend to underfit the network, so 3e-5 is a better choice in this network compared to 1e-4.
| Model | L2Decay | Train acc1/acc5 | Test acc1/acc5 |
| ----------------- | ------- | --------------- | -------------- |
| MobileNetV1_x0_25 | 1e-4 | 43.79%/67.61% | 50.41%/74.70% |
| MobileNetV1_x0_25 | 3e-5 | 47.38%/70.83% | 51.45%/75.45% |
In addition, the setting of this value is also related to whether other regularizations are used during training. If the data preprocessing is complicated, which equates with a harder training task, the value can be reduced appropriately. The following table shows the accuracy of ResNet50 on ImageNet-1k using different L2Decay after the RandAugment preprocessing. It is easy to observe that a smaller l2_decay helps to improve the model accuracy for a harder task.
| Model | L2Decay | Train acc1/acc5 | Test acc1/acc5 |
| -------- | ------- | --------------- | -------------- |
| ResNet50 | 1e-4 | 75.13%/90.42% | 77.65%/93.79% |
| ResNet50 | 7e-5 | 75.56%/90.55% | 78.04%/93.74% |
- Other datasets (ImageNet-1k pre-training loaded by default): When transferring, it is better not to change the value of L2Decay when training ImageNet-1k (i.e. training to get the L2Decay value of pre-training, the L2Decay value of each backbone is in the corresponding yaml file), and the change of learning rate is enough for general datasets.
### Q5.5 Should I use label_smoothing and how to set the value of the parameter?
**A**:
Label_smoothing is a regularization method in deep learning, whose full name is Label Smoothing Regularization (LSR). In the traditional classification task, the loss function is calculated by the cross-entropy of the real one hot label and the output of the neural network, while label_smoothing is a label smoothing of the real one hot label, so that the label learned by the network is no longer a hard label, but a soft label with a probability value, where the probability at the position corresponding to the category is the largest and others small. In label_smoothing, the epsilon parameter describes the degree of label softening, the larger the value, the smaller the label probability value of the label vector after label smoothing, the smoother the label, and vice versa. The specific information of the dataset is as follows:
- ImageNet-1k: This value is usually set to 0.1 in experiments training ImageNet-1k, and there is a steady increase in accuracy for models of the ResNet50 size and above after using label_smooting. The following table shows the accuracy metrics of ResNet50_vd before and after using label_smoothing.
| Model | Use_label_smoothing(0.1) | Test acc1 |
| ----------- | ------------------------ | --------- |
| ResNet50_vd | 0 | 77.9% |
| ResNet50_vd | 1 | 78.4% |
At the same time, since label_smoohing can be regarded as a regularization method, the accuracy improvement is not obvious or even decreases on a relatively small model. The following table shows the accuracy metrics of ResNet18 before and after using label_smoothing on ImageNet-1k. It is clear that the accuracy drops after using label_smoothing.
| Model | Use_label_smoohing(0.1) | Train acc1/acc5 | Test acc1/acc5 |
| -------- | ----------------------- | --------------- | -------------- |
| ResNet18 | 0 | 69.81%/87.70% | 70.98%/89.92% |
| ResNet18 | 1 | 68.00%/86.56% | 70.81%/89.89% |
Here is a trick to make label-smoothing effective even in smaller models, i.e., a fully connected layer of size 1000-2000 after the Global-Average-Pool, which works better with label-smoothing.
- Other datasets (loaded with ImageNet-1k pre-training by default): the use of label-smooth tends to improve the accuracy, the smaller the dataset the larger the epsilon value can be. And in some smaller fine-grained images, the best model is usually obtained with the value set to 0.4-0.5.
### Q5.6 Is random-crop still adjustable in the default image preprocessing? How ?
**A**:
In the standard preprocessing of ImageNet-1k data, the random_crop function defines two values, scale and ratio, which respectively determine the size of the image crop and the degree of image stretching, where the default value of the former is 0.08-1 (lower_scale-upper_scale), and the latter is 3/4-4/3 (lower_ratio-upper_ratio). In very small networks, this kind of data augmentation can lead to network underfitting and decreased accuracy. To the end, the data augmentation can be made weaker by increasing the crop area of the image or decreasing the stretching of the image. Weaker image transformation can be achieved by increasing the value of lower_scale or reducing the difference between lower_ratio and upper_scale, respectively. The specific information of the dataset is as follows:
- ImageNet-1k: It is recommended to use only the default value for networks that are not particularly small, and to increase the value of lower_scale (to increase the crop area) or decrease the range of ratio values (to weaken the image stretching) for networks that are particularly small, and conduct the opposite for networks that are particularly large. The following table shows the accuracy of training MobileNetV2_x0_25 with different lower_scale, and we can see that the training accuracy and verification accuracy are improved by increasing the crop area of the images.
| Model | Range of Scale | Train_acc1/acc5 | Test_acc1/acc5 |
| ----------------- | -------------- | --------------- | -------------- |
| MobileNetV2_x0_25 | [0.08,1] | 50.36%/72.98% | 52.35%/75.65% |
| MobileNetV2_x0_25 | [0.2,1] | 54.39%/77.08% | 53.18%/76.14% |
- Other datasets (ImageNet-1k pre-training is loaded by default): it is recommended to use the default value, if the overfitting is too over, please consider adjusting the value of lower_scale (to reduce the crop area) or increasing the range of ratio values (to enhance the image stretching).
### Q5.7 What are the common data augmentation? How to choose?
**A**:
In general, the size of the dataset is crucial to the performance, but the annotation of images is often expensive, hence there are rare annotated images, which highlight the importance of data augmentation. In the standard data augmentation for training ImageNet-1k, two methods, Random_Crop and Random_Flip, are mainly adopted. However, in recent years, an increasing number of data augmentation methods have been proposed, such as cutout, mixup, cutmix, AutoAugment, etc. Experiments show that these methods can effectively improve the accuracy of the model. The specific information of the dataset is as follows:
- ImageNet-1k: The following table lists the performance of ResNet50 adopting 8 different data augmentation methods. It can be seen that all of them are beneficial compared to baseline, with cutmix being the most effective data augmentation so far. For more information about data augmentation, please refer to the chapter of [**Data Augmentation**](../advanced_tutorials/DataAugmentation_en.md).
| Model | Date Augmentation Method | Test top-1 |
| -------- | ------------------------ | ---------- |
| ResNet50 | Standard Transformation | 77.31% |
| ResNet50 | Auto-Augment | 77.95% |
| ResNet50 | Mixup | 78.28% |
| ResNet50 | Cutmix | 78.39% |
| ResNet50 | Cutout | 78.01% |
| ResNet50 | Gridmask | 77.85% |
| ResNet50 | Random-Augment | 77.70% |
| ResNet50 | Random-Erasing | 77.91% |
| ResNet50 | Hide-and-Seek | 77.43% |
- Other datasets (ImageNet-1k pre-training is loaded by default): In other datasets except those using Auto-Augment, there is generally an accuracy gain. Auto-Augment will search for each dataset with an independent hyper-parameter which determines how the data is processed, so the default ImageNet-1k hyper-parameter is not suitable for all datasets, but you can use Random-Augmentation instead of Auto-Augmentation. Other strategies can be used normally, but for harder tasks or smaller networks, it is not recommended to adopt strong data augmentation.
In addition, multiple data augmentations can be overlaid to further improve accuracy when the data set is simple or the data size is small.
### Q5.8 How to determine the tuning strategy by train_acc and test_acc?
**A**:
In the process of training a network, the accuracy of the training set and validation set are usually printed for each epoch, which portrays the performance of the model on both datasets. Generally speaking, the training set accuracy reflects the data accuracy after Random-Crop, and since the data is often more complex after Random-Crop, the training set accuracy and the validation set accuracy are often not the same concepts.
- ImageNet-1k: Generally speaking, it is good to have a comparable accuracy or a slightly higher accuracy in the training set than in the validation set. If we find that the accuracy of the training set is much higher than the validation set, it means that the training set is overfitted and we need to add more regularity, such as increasing the value of L2Decay, adding more data augmentation strategies, introducing label_smoothing strategies, etc. If we find that the accuracy of the training set is lower than the validation set, it means that the training set is probably underfitted, and the regularization effect should be weakened during the training, such as reducing the value of L2Decay, decreasing the data augmentation methods, increasing the area of the crop area, weakening the image stretching, removing label_smoothing, etc.
- Other datasets (ImageNet-1k pretraining loaded by default): basically the same as ImageNet-1k training, in addition, if the model on other datasets tends to overfit (train acc much larger than test acc), you can also use better pretraining weights. PaddleClas provides distillation pretraining weights of SSLD for common networks, which are better than those of ImageNet-1k, expecting your preference.
- **[Note]** It is not recommended to readjust the training strategy according to the loss. After using different data augmentation, the size of the train loss varies greatly. For example, after using Cutmix or RandAugmentation, the train loss will exceed the test loss, and when the data augmentation strategy is weakened, the train loss will be smaller than the test loss, making it more difficult to adjust.
### Q5.9 How to improve the accuracy of your own dataset by pre-training the model?
**A**:
At this stage, it has become a common practice in the image recognition field to load pre-trained models to train their own tasks, which can often improve the accuracy of a particular task compared to training from random initialization. In general, the pre-training model widely used in the industry is obtained by training the ImageNet-1k dataset of 1.28 million images of 1000 classes. The fc layer weights of this pre-training model are a matrix of k*1000, where k is the number of neurons before the fc layer, and it is not necessary to load the fc layer weights when loading the pre-training weights. In terms of the learning rate, if your dataset is particularly small (e.g., less than 1,000), we recommend you to adopt a small initial learning rate, e.g., 0.001 (batch_size:256, the same below), so as not to corrupt the pre-training weights with a larger learning rate. If your training dataset is relatively large (>100,000), we suggest you try a larger initial learning rate, such as 0.01 or above. If the target dataset is small, you can also freeze some shallow weights. Also, if you want to train a small dataset for a specific vertical class, you can train a pre-training weight on a related large dataset first, and then fine-tune the model with a smaller learning rate on that weight.
### Q5.10 Existing strategies have saturated the accuracy of the model, how can the accuracy of a particular model be further improved?
**A**: If the existing strategy cannot further improve the accuracy of the model, it means that the model has almost reached saturation with the existing dataset and strategy, and two methods are provided here.
- Mining relevant data: Use the model trained on the existing dataset to make predictions on the relevant data, label the data with higher confidence and add it to the training set for further training. Repeat the steps above to further improve the accuracy of the model.
- Knowledge distillation: You can use a larger model to train a teacher model with higher accuracy on the dataset, and then adopt the teacher model to teach a Student model, where the Student model is the target model. PaddleClas provides Baidu's own SSLD knowledge distillation scheme, which can steadily improve by more than 3% even on such a challenging classification task as ImageNet-1k. For the chapter on SSLD knowledge distillation, please refer to [**SSLD Knowledge Distillation**](../advanced_tutorials/knowledge_distillation_en.md).
<a name="6"></a>
## Issue 6
### Q6.1: What are the differences between the several branches of PaddleClas? How should I choose?
**A**: PaddleClas currently has 3 branches:
- Develop: develop branch is the development branch of PaddleClas as well as the most updated branch. All new features and changes will proceed on this branch first. If you want to keep track of the latest progress of PaddleClas, you can follow this branch. This branch mainly supports dynamic graphs and will be updated along with the version of paddlepaddle.
- Stable release (e.g. release/2.1.3): Fast updates keep followers informed of the latest progress, but they can also bring instability. Therefore, at critical points, we pull branches from the develop branch to provide stable releases, and the latest stable branch is also the default branch. Note that we only maintain the latest stable release, and generally fix bugs only without updating new features and models unless there are special circumstances.
- Static branch is a branch mainly for old users that adopts the static version of the graph, receiving simple maintenance only with no new features and models updated. It is not recommended for new users, and those who still employ it is recommended to turn to the dynamic graph branch or the stable release branch if the condition permits.
In general, it is recommended to choose the develop branch if you want to keep up with the latest developments of PaddleClas, and the latest stable release branch if you need a stable version.
### Q6.2: What is the static graph mode?
**A**:
The static graph mode is declarative programming, which is initially adopted by many deep learning frameworks such as tensorflow, mxnet, etc. In the static graph mode, you need to define the model structure first, and then the framework will compile and optimize the model structure to build the "computational graph". It can be simply understood that the mode is a static and unchanging pattern of the "computational graph". The advantage of it is that the compiler generally builds the graph once, which is relatively efficient, but the disadvantage lies in that it is not flexible enough and troublesome to debug. For example, if you run a static graph model in paddle, you need to complete all the operations, then extract the output according to a specific key, which means you cannot get the result in real-time.
### Q6.3: What is the dynamic graph mode?
**A**:
Dynamic diagram mode is imperative programming, where users do not need to pre-define the network structure, and each line of code can be run directly to get the result. Compared with static graph mode, this one is more user-friendly and easier to debug. In addition, the structure design of dynamic graph mode is more flexible, and the structure can be adjusted at any time during the operation.
PaddleClas currently uses a dynamic graph model for its continuously updated develop branch and the stable release branch. If you are new, it is recommended to use dynamic graph mode for development and training. If there is a performance requirement for inference and prediction, you can convert the dynamic graph model to a static one to improve efficiency after the training.
### Q6.5: When building a classification dataset, how to build the data of the "background" category?
**A**:
In practice, it is often necessary to construct your own classification dataset for training purposes. In addition to the required category data, an additional category is needed, i.e., a "background" category. For example, if we create a cat and dog classification, with cats as one category and dogs as another, then an input image of a rabbit will be forced into one of these two categories. Therefore, during training, we should add some data from non-target categories as the "background" category data.
When constructing the data for the "background" category, the first step is to consider the actual requirements. For example, if the actual test data are all animals, the "background" category data should include some animals other than dogs and cats. If the test data contains more categories, such as a tree, then the "background" category should be enriched. To put it simply, the data in the "background" category should be collected according to the situations that may occur in the actual scenario. The more situations included, the more types of data need to be covered, and the more difficult the task will be. Therefore, in practice, it is better to limit the problem to avoid the waste of resources and computing power.

@ -0,0 +1,267 @@
# Image Classification FAQ Summary - 2021 Season 1
## Catalogue
- [1. Issue 1](#1)(2021.01.05)
- [2. Issue 2](#2)(2021.01.14)
- [3. Issue 3](#3)(2020.01.21)
- [4. Issue 4](#4)(2021.01.28)
- [5. Issue 5](#5)(2021.02.03)
<a name="1"></a>
## Issue 1
### Q1.1: Why is the prediction accuracy of the exported inference model very low ?
**A**You can check the following aspects:
- Check whether the path of the pre-training model is correct or not.
- The default class number is 1000 when exporting the model. If the pre-training model has a custom class number, you need to specify the parameter `--class_num=k` when exporting, k is the custom class number.
- Compare the output class id and score of `tools/infer/infer.py` and `tools/infer/predict.py` for the same input. If they are exactly the same, the pre-trained model may have poor accuracy itself.
### Q1.2: How to deal with the unbalanced categories of training samples?
**A**There are several commonly used methods.
- From the perspective of sampling
- The samples can be sampled dynamically according to the categories, with different sampling probabilities for each category and ensure that the number of training samples in different categories is basically the same or in the desired proportion in the same minibatch or epoch.
- You can use the oversampling method to oversample the categories with a small number of images.
- From the perspective of loss function
- The OHEM (online hard example miniing) method can be used to filter the hard example based on the loss of the samples for gradient backpropagation and parameter update of the model.
- The Focal loss method can be used to assign a smaller weight to the loss of some easy samples and a larger weight to the loss of hard samples, so that the loss of easy samples contributes to the overall loss of the network without dominating the loss.
### Q1.3 When training in docker, the data path and configuration are fine, but it keeps reporting `SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception`, why is this?
**A**
This may be caused by the small shared memory in docker. When creating docker, the default size of `/dev/shm` is 64M, if you use multiple processes to read data, the shared memory may fall short, so you need to allocate more space to `/dev/shm`. When creating a docker, input `--shm-size=8g` to allocate 8G to `/dev/shm`, which is usually enough.
### Q1.4 Where can I download the 10W class image classification pre-training model provided by PaddleClas and how to use it?
**A**
Based on ResNet50_vd, Baidu open-sourced its own large-scale classification pre-training model with 100,000 categories and 43 million images. The former is available for download at [download address](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_10w_pretrained.tar), where it should be noted that the pre-training model does not provide the final FC layer parameters and thus cannot be used directly for inference; however, it can be used as a pre-training model to fine-tune it on your own dataset. It is verified that this pre-training model has a more significant accuracy gain of up to 30% on different datasets than the ResNet50_vd pre-training model based on the ImageNet1k dataset.
### Q1.5 How to accelerate when using C++ for inference deployment?
**A**You can speed up the inference in the following ways.
1. For CPU inference, you can turn on mkldnn and increase the number of threads (cpu_math_library_num_threads, in `tools/config.txt`), which is usually 6~10.
2. For GPU inference, you can enable TensorRT inference and FP16 inference if the hardware condition allows, which can further speed up the speed.
3. If the memory or video memory is sufficient, the batch size of inference can be increased.
4. The image preprocessing logic (mainly designed for resize, crop, normalize, etc.) can be run on the GPU, which can further speed up the process.
You are welcomed to add more tips on inference deployment acceleration.
<a name="2"></a>
## Issue 2
### Q2.1: Does PaddleClas have to start from 0 when setting labels, and does class_num have to equal the number of classes in the dataset?
**A**
In PaddleClas, the label start from 0 by default, so try to set the label from 0. Of course, it is possible to set them from other values, which will result in a larger class_num and will in turn lead to a larger number of FC layer parameters for the classification, so the weight file will take up more storage space. In the case of a continuous set of classes, set class_num equal to the number of classes in the dataset (of course, it is acceptable to set it greater than the number of classes in the dataset, and even higher accuracy can be obtained in many datasets, but it will also result in a larger number of FC layers), and in the case of a discontinuous set of classes, the class_num should be equal to the largest class_id+1 in the dataset.
### Q2.2: How to address the the problem of large space occupation of weight file resulted from great FC due to numerous class number?
**A**
The final FC weight is a large matrix of size C*class_num, where C is the number of neural units in the previous layer of FC, e.g., C is 2048 in ResNet50, and the size of FC weight can be further reduced by decreasing the value of C. For example, a layer of FC with smaller dimension can be added after GAP, which can greatly reduce the weight size of the final classification layer.
### Q2.3: Why did the training of ssld distillation on a custom dataset using PaddleClas fail to meet the expectation?
First, it is necessary to ensure that the accuracy of the Teacher model. Second, it is necessary to ensure that the Student model is successfully loaded with the pre-training weights of ImageNet-1k and the Teacher model is successfully loaded with the weights of the training custom dataset. Finally, it is necessary to ensure that the initial learning rate is not too large, or at least smaller than the value of the ImageNet-1k training.
### Q2.4: Which networks have advantages on mobile or embedded side?
It is recommended to use the Mobile Series network, and the details can be found in [Introduction to Mobile Series Network Structure](../models/Mobile_en.md). If the speed of the task is the priority, MobileNetV3 series can be considered, and if the model size is more important, the specific structure can be determined based on the StorageSize - Accuracy in the Introduction to the Mobile Series Network Structure.
### Q2.5: Why use a network with large number of parameters and computation such as ResNet when the mobile network is so fast?
Different network structures have various speed advantages running on disparate devices. On the mobile side, mobile series networks run faster than server-side networks, but on the server side, networks with specific optimizations such as ResNet have greater advantages for the same accuracy. So the specific network structure needs to be decided on a case-by-case basis.
<a name="3"></a>
## Issue 3
### Q3.1: What are the characteristics of a double (multi)-branch structure and a Plain structure, respectively?
**A**
Plain networks, represented by VGG, have evolved into multi-branch network structures, represented by the ResNet series (with residual modules) and the Inception series (with multiple convolutional kernels in parallel). It is found that the multi-branch structure is more friendly in the model training, and the larger network width can bring stronger feature fitting ability, while the residual structure can avoid the problem of disappearing gradient of the deep network. However, in the inference phase, the model with a multi-branch structure has no speed advantage. Even though the FLOPs of the model are lower, the computational density of the multi-branch structure model is also dissatisfactory. For example, the FLOPs of VGG16 model are much larger than those of EfficientNetB3, but the inference speed of the latter is significantly faster than the former. Therefore, the multi-branch structure is more friendly in the model training, while the Plain structure model is more suitable for the inference. Starting from this, we can use a multi-branch network structure in the training phase to exchange a larger training time cost for a model with better feature fitting ability, and convert the multi-branch structure to Plain structure in the inference phase to exchange a shorter inference time. The conversion from multi-branch to a Plain structure can be achieved by the structural re-parameterization technique.
In addition, the Plain structure is more friendly for pruning operations.
Note: The term "Plain structure" and "structural re-parameterization" are from the paper "RepVGG: Making VGG-style ConvNets Great Again". Plain structure network model means that there is no branching structure in the whole network, i.e., the input of layer `i` is the output of layer `i-1` and the output of layer `i` is the input of layer `i+1`.
### Q3.2: What are the main innovations of ACNet?
**A** ACNet means "Asymmetric Convolution Block", and the idea is from the paper "ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks", which proposes a set of three CNN convolutional kernels with "ACB" structure to replace the traditional square convolutional kernels in existing convolutional neural networks in the training phase.
The size of the square convolution kernel is assumed to be `d*d`, i.e., the width and height are all `d`. The ACB structure used to replace this convolution kernel is three convolution kernels of size `d*d`, `1*d`, and `d*1`, and then the outputs of the three convolution kernels are added directly to obtain the same size as the original square convolution kernel. After the training, the ACB structure is replaced by the original square convolution kernel, and the parameters of which are directly added to the parameters of the three convolution kernels of the ACB structure (see `Q3.4`), so the same model structure as before is used for inference, and the ACB structure is only used in the training phase.
During the training, the network width of the model is improved by the ACB structure, and more features are extracted using the two asymmetric convolution kernels of `1*d` and `d*1` to enrich the information of the feature maps extracted by the `d*d` convolution kernels. In the inference stage, this design idea does not bring additional parameters and computational overhead. The following figure shows the form of convolutional kernels for the training phase and the inference deployment phase, respectively.
![](../../images/faq/TrainingtimeACNet.png)
![](../../images/faq/DeployedACNet.png)
Experiments by the authors of the article show that the model capability can be significantly improved by using ACNet structures in the training of the original network model, as explained by the original authors as follows.
1. Experiments show that for a `d*d` convolution kernel, the parameters of the skeleton position (e.g., the `skeleton` position of the convolution kernel in the above figure) have a greater impact on the model accuracy than the parameters of the corner position (e.g., the `corners` position of the convolution kernel in the above figure) of the eliminated convolution kernel, so the parameters of the skeleton position of the convolution kernel are essential. And the two asymmetric convolution kernels in the ACB structure enhance the weight of the square convolution kernel skeleton position parameter, making it more significant. About whether this summation will weaken the role of the skeleton position parameter due to the offsetting effect of positive and negative numbers, the authors found through experiments that the training of the network always goes in the direction of increasing the role of the skeleton position parameter, and there is no weakening due to the offsetting effect.
2. The asymmetric convolution kernel is more robust for flipped images, as shown in the following figure, the horizontal asymmetric convolution kernel is more robust for flipped images up and down. The feature maps extracted by the asymmetric convolution kernel are the same for semantically the same position in the image before and after the flip, which is better than the square convolution kernel.
![](../../images/faq/HorizontalKernel.png)
### Q3.3: What are the main innovations of RepVGG?
**A**
Through Q3.1 and Q3.2, it may occur to us to decouple the training phase and inference phase by ACNet, and use multi-branch structure in the training phase and Plain structure in inference phase, which is the innovation of RepVGG. The following figure shows the comparison of the network structures of ResNet and RepVGG in the training and inference phases.
![](../../images/faq/RepVGG.png)
First, the RepVGG in the training phase adopts a multi-branch structure, which can be regarded as a residual structure with `1*1` convolution and constant mapping on top of the traditional VGG network, while the RepVGG in the inference phase degenerates to a VGG structure. The transformation of the network structure from RepVGG in the training phase to RepVGG in the inference phase is implemented using the "structural reparameterization" technique.
The constant mapping can be regarded as the output of the `1*1` convolution kernel with `1` parameters acting on the input feature map, so the convolution module of RepVGG in the training phase can be regarded as two `1*1` convolutions and one `3*3` convolution, and the parameters of the `1*1` convolution can be directly added to the parameters at the center of the `3*3` convolution kernel (this operation is similar to the operation of adding the parameters of the asymmetric convolution kernel to the parameters of the skeleton position of the square convolution kernel in ACNet). By doing this, the constant mapping, `1*1` convolution, and `3*3` convolution branches of the network structure can be combined into one `3*3` convolution in the inference stage, as described in `Q3.4`.
### Q3.4: What are the similarities and differences between the struct re-parameters in ACNet and RepVGG?
**A**
From the above, it can be simply understood that RepVGG is the extreme version of ACNet. Re-parameters operation in ACNet is shown in the following figure.
![](../../images/faq/ACNetReParams.png)
Take `conv2` as an example, the asymmetric convolution can be regarded as a square convolution kernel of `3*3`, except that the upper and lower six parameters of the square convolution kernel are `0`, and it is the same for `conv3`. On top of that, the sum of the results of `conv1`, `conv2`, and `conv3` is equivalent to the sum of three convolution kernels followed by convolution. With `Conv` denoting the convolution operation and `+` denoting the addition operation of the matrix, then: `Conv1(A)+Conv2(A)+Conv3(A) == Convk(A)`, where `Conv1`, ` Conv2`, `Conv3` have convolution kernels `Kernel1`, `kernel2`, `kernel3`, and `Convk` has convolution kernels `Kernel1 + kernel2 + kernel3`, respectively.
The RepVGG network is the same as ACNet, except that the `1*d` asymmetric convolution of ACNet becomes the `1*1` convolution, and the position where the `1*1` convolutions are summed becomes the center of the `3*3` convolution.
### Q3.5: What are the factors that affect the computation speed of a model? Does a model with a larger number of parameters necessarily have a slower computation speed?
**A**
There are many factors that affect the computation speed of the model, and the number of parameters is only one of them. Specifically, without considering the hardware differences, the computation speed of the model can be referred to the following aspects.
1. Number of parameters: the number of parameters used to measure the model, the larger the number of parameters of the model, the higher the memory (video memory) requirements of the model during computation. However, the size of the memory (video memory) footprint does not depend entirely on the number of parameters. In the figure below, assuming that the input feature map memory footprint size is `1` unit, for the residual structure on the left, the peak memory footprint during computation is twice as large as that of the Plain structure on the right, because the results of the two branches need to be recorded and then added together.
![](../../images/faq/MemoryOccupation.png)
2. FLOPs: Note that FLOPs are distinguished from floating point operations per second (FLOPS), which can be simply understood as the amount of computation and is usually adopted to measure the computational complexity of a model. Taking the common convolution operation as an example, considering no batch size, activation function, stride operation, and bias, assuming that the input future map size is `Min*Min` and the number of channels is `Cin`, the output future map size is `Mout*Mout` and the number of channels is `Cout`, and the conv kernel size is `K*K`, the FLOPs for a single convolution can be calculated as follows.
1. The number of feature points contained in the output feature map is: `Cout * Mout * Mout`.
1. For the convolution operation for each feature point in the output feature map: the number of multiplication calculations is: `Cin * K * K`; the number of addition calculations is: `Cin * K * K - 1`.
1. So the total number of computations is: `Cout * Mout * Mout * (Cin * K * K + Cin * K * K - 1)`, i.e. `Cout * Mout * Mout * (2Cin * K * K - 1)`.
3. Memory Access Cost (MAC): The computer needs to read the data from memory (general memory, including video memory) to the operator's Cache before performing operations on data (such as multiplication and addition), and the memory access is very time-consuming. Take grouped convolution as an example, suppose it is divided into `g` groups, although the number of parameters and FLOPs of the model remain unchanged after grouping, the number of memory accesses for grouped convolution becomes `g` times of the previous one (this is a simple calculation without considering multi-level Cache), so the MAC increases significantly and the computation speed of the model slows down accordingly.
4. Parallelism: The term parallelism often includes data parallelism and model parallelism, in this case, model parallelism. Take convolutional operation as an example, the number of parameters in a convolutional layer is usually very large, so if the matrix in the convolutional layer is chunked and then handed over to multiple GPUs separately, the purpose of acceleration can be achieved. Even some network layers with too many parameters for a single GPU memory may be divided into multiple GPUs, but whether they can be divided into multiple GPUs in parallel depends not only on hardware conditions, but also on the specific form of operation. Of course, the higher the degree of parallelism, the faster the model can run.
<a name="4"></a>
## Issue 4
### Q4.1: There is certain synthetic data in image classification task, is it necessary to use sample equalization?
**A**:
1. If the number of samples of different categories varies greatly, and the samples of one category are expanded to more than several times of other categories due to the synthetic data set, it is necessary to reduce the weights of that category appropriately.
2. If some categories are synthetic and some are semi-synthetic and semi-real, the equalization is not needed as long as the number is in an order of magnitude. Or you can try to train it, testing whether the synthetic category samples can be accurately identified.
3. If the performance of the category of different sources of data is degraded due to the increase of synthetic data, it is necessary to consider whether the synthetic dataset has noise or difficult samples. You can also properly increase the weight of the category to obtain better recognition performance of the category.
### Q4.2: What new opportunities and challenges will be brought by the introduction of Vision Transformer (ViT) into the field of image classification by academia? What are the advantages over CNN?
Paper address: [AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE](https://openreview.net/pdf?id=YicbFdNTTy)
**A**:
1. The dependence of images on CNNs is unnecessary, and the computational efficiency and scalability of the Transformer allow for training very large models without saturation as the model and dataset grow. Inspired by the Transformer for NLP, when being used in image classification tasks, images are divided into sequential patches that are fed into a linear unit embedding as input to the transformer.
2. In medium-sized datasets such as ImageNet1k, ImageNet21k, the visual Transformer model is several percentage points lower than ResNet of the same size. It is speculated that this is because the transformer lacks the Locality and Spatial Invariance that CNNs have, and it is difficult to outperform convolutional networks when the amount of data is not large enough. But for this problem, the data augmentation adopted by [DeiT](https://arxiv.org/abs/2012.12877) to some extent addresses the reliance of Vision Transformer on very large datasets for training.
3. This approach can go beyond local information and model more long-range dependencies when training on super large-scale datasets 14M-300M, while CNN can better focus on local information but is weak in capturing global information.
4. Transformer once reigned in the field of NLP, but was also questioned as not applicable to the CV field. The current several pieces of visual field articles also deliver competitive performance as the SOTA of CNN. We believe that a joint Vision-Language or multimodal model will be proposed that can solve both visual and linguistic problems.
### Q4.3: For the Vision Transformer model, how is the image converted into sequence information for the Encoder?
**A**:
1. Using the Transformer model, mainly the attention approach. We want to conceive a scenario where semantic embedding information is applicable. But image classification is not very relevant to the semantic information of sequences, so Vision Transformer has its own unique design, and it is precisely the goal of ViT to use attention mechanisms instead of CNNs.
2. Consider the input form of the Encoder in Transformer, as shown below:
- (1) variable-length sequential input, because it is RNN structure with various amounts of words in one sentence. If it is an NLP scene, the change of word order affects little of the semantics, but the position of the image means a lot since great misunderstanding can be caused when different regions are connected in a different order.
- (2) Single patch position is transformed into a vector with fixed dimension. Encoder input is patch pixel information embedding, combined with some fixed position vector concate to synthesize a vector with fixed dimension and position information in it.
![](../../images/faq/Transformer_input.png)
3. Consider the following question: How to pass an image to an encoder?
- As the following figure shows. Suppose the input image is [224,224,3], which is cut into many patches in order from left to right and top to bottom, and the patch size can be [p,p,3] (p can be 16, 32). Convert it into a feature vector using the Linear Projection of Flattened Patches module and concat a position vector into the Encoder.
![](../../images/faq/ViT_structure.png)
4. As shown above, given an image of `H×W×C` and a block size P, the image can be divided into `N` blocks of `P×P×C`, `N=H×W/(P×P)`. After getting the blocks, we have to use linear transformation to convert them into D-dimensional feature vectors, and then add the position encoding vectors. Similar to BERT, ViT also adds a classification flag bit before the sequence, denoted as `[CLS]`. The ViT input sequence `z` is shown in the following equation, where `x` represents an image block.
![](../../images/faq/ViT.png)
5. ViT model is basically the same as Transformer, where the input sequence is passed into ViT and then the final output features are classified using the `[CLS]` flags. viT consists mainly of MSA (multiheaded self-attentive) and MLP (two-layer fully connected network using GELU activation function), with LayerNorm and residual connections before MSA and MLP
### Q4.4: How to understand Inductive Bias?
**A**:
1. In machine learning, some assumptions are made about the problem to be applied, and this assumption is called inductive preference. Certain a priori rules are inducted from the phenomena observed in real life, and then certain constraints are made on the model, thus playing the role of model selection. In CNN, it is assumed that the features have the characteristics of Locality and Spatial Invariance, that is, the adjacent features are connected but not those that are far away, and the adjacent features are fused together to produce "solutions" more easily; there is also the attention mechanism, which is the rule inducted from human intuition and life experience.
2. Vision Transformer utilizes an inductive bias that is linked to Sequentiality and Time Invariance, i.e., the time interval in sequence order, and therefore also yields better performance than CNN-like models on larger datasets. In the Conclusion of the article, "Unlike prior works using self-attention in computer vision, we do not introduce any image-specific inductive biases into the architecture" and "We find that large scale training trumps inductive bias" in the Introduction, we can conclude that intuitively the generation of inductive bias in the case of large amounts of data is performance-degrading and should be discarded whenever possible.
### Q4.5: Why does ViT add a [CLS] flag bit? Why is the vector corresponding to the [CLS] used as the semantic representation of the whole sequence?
**A**:
1. Similar to BERT, ViT adds a `[CLS]` flag bit before the first patch, and the vector corresponding to the last end flag bit can be used as a semantic representation of the whole image, and thus for downstream classification tasks, etc. Therefore, the whole embedding group can characterize the features of the image at different locations.
2. The vector corresponding to the `[CLS]` flag bit is used as the semantic representation of the whole image because this symbol with no obvious semantic information will "fairly" integrate the semantic information of each patch in the image compared with other patches, and thus better represent the semantic of the whole image.
<a name="5"></a>
## Issue 5
### Q5.1: What is included in the PaddleClas training profile? How to modify during the model training?
**A**: PaddleClas configures 6 modules, namely: global configuration, network architecture, learning rate, optimizer, training and validation.
The global configuration contains information about the configuration of the task, such as the number of categories, the amount of data in the training set, the number of epochs to train, the size of the network input, etc. If you want to train a custom task or use your own training set, please pay attention to this section.
The configuration of the network structure defines the network to be used. In practice, the first step is to select the appropriate configuration file, so this part of the configuration is usually not modified. Modifications are only made when customizing the network structure, or when there are special requirements for the task.
It is recommended to use the default configuration for the learning rate and optimizer. These parameters are the ones that are already tuned. Fine-tuning can also be done if the changes to the task are significant.
The training and inference configurations include batch_size, dataset, data transforms (transforms), number of workers (num_workers) and other important configurations, which should be modified according to the actual environment. Note that the batch_size in paddleClas is a single card configuration, if it is a multi-card training, the total batch_size is a multiple of the one set in the configuration file. For example, if the configuration file has the batch_size of 64 and 4 cards trained, the total batch_size is 4*64=256. num_workers defines the number of processes on a single card, i.e., if num_workers is 8 with 4 cards for training, there are actually 32 workers.
### Q5.2: How to quickly modify the configuration in the command line?
**A**:
During training, we often need to constantly fine-tune individual configurations without frequent modification of the configuration file. This can be adjusted using -o. The modification is done by first writing the name of the configuration to be changed by level, splitting the levels with dots, and then writing the value to be modified. For example, if we want to modify batch_size, we can add -o DataLoader.TRAIN.sampler.batch_size=512 after the training command.
### Q5.3: How to choose the right model according to the accuracy curve of PaddleClas?
**A**:
PaddleClas provides benchmarks for several models and plots performance curves. There are mainly three kinds: accuracy-inference time curve, accuracy-parameter count curve and accuracy-FLOPS curve, with accuracy on the vertical axis and the other three on the horizontal axis. in general, different models perform consistently on the three plots. Models of the same series are represented on the plots using the same symbols and connected by curves.
Taking the accuracy-inference time curve as an example, a higher point indicates higher accuracy, and a left one indicates faster speed. For example, the model in the upper-left region is a fast and accurate model, while the leftmost point close to the vertical axis is a lightweight model. When using this, you can choose the right model by considering both accuracy and time. As an example, if we want the most accurate model that runs under 10ms, first draw a vertical line 10ms out from the horizontal axis, and then find the highest point on the left side, which is the model that meets the requirements.
In practice, the number of parameters and FLOPS of the model are constant, while the operation time varies under different hardware and software conditions. If you want to choose the model more accurately, then you can run the test in your own environment and get the corresponding performance graph.
### Q5.4: If I want to add two classes in imagenet, can I fix the parameters of the existing fully connected layer and only train the new two?
**A**:
This idea works in theory, but I am afraid it will not work too well. If only the fully-connected layers are fixed and the parameters of the preceding convolutional layers are changed, there is no guarantee that these fully-connected layers will work the same as they did at the beginning. If the parameters of the entire network are kept constant and only the two new categories of fully connected layers are trained, it is also difficult to train the desired results.
If you really need the original 1000 categories to be accurate, you can add the data of the new categories to the original training set, and then finetune with the pre-training model. if you only need a few of the 1000 categories, you can pick out this part of the data, and finetune it after mixing it with the new data.
### Q5.5: When using classification models as pre-training models for other tasks, which layers should be selected as features?
**A**:
There are many strategies to use classification models as the backbone for other tasks, and a more basic approach is presented here. First, remove the final fully-connected layer, which mainly contains the classification information of the original task. If the task is relatively simple, just use the output of the previous layer as featuremap and add the structure corresponding to the task on top of it. If the task involves multiple scales, and the anchor of different scales needs to be selected, such as some detection models, then the output of the layer before each downsampling can be selected as the featuremap.

@ -0,0 +1,356 @@
# PaddleClas FAQ Summary - 2021 Season 2
## Before You Read
- We collect some frequently asked questions in issues and user groups since PaddleClas is open-sourced and provide brief answers, aiming to give some reference for the majority to save you from twists and turns.
- There are many talents in the field of image classification, recognition and retrieval with quickly updated models and papers, and the answers here mainly rely on our limited project practice, so it is not possible to cover all facets. We sincerely hope that the man of insight will help to supplement and correct the content, thanks a lot.
## Catalogue
- [1. Theory](#1)
- [1.1 Basic Knowledge of PaddleClas](#1.1)
- [1.2 Backbone Network and Pre-trained Model Library](#1.2)
- [1.3 Image Classification](#1.3)
- [1.4 General Detection](#1.4)
- [1.5 Image Recognition](#1.5)
- [1.6 Vector Search](#1.6)
- [2. Practice](#2)
- [2.1 Common Problems in Training and Evaluation](#2.1)
- [2.2 Image Classification](#2.2)
- [2.3 General Detection](#2.3)
- [2.4 Image Recognition](#2.4)
- [2.5 Vector Search](#2.5)
- [2.6 Model Inference Deployment](#2.6)
<a name="1"></a>
## 1. Theory
<a name="1.1"></a>
### 1.1 Basic Knowledge of PaddleClas
#### Q1.1.1 Differences between PaddleClas and PaddleDetection
**A**PaddleClas is an image recognition repo that integrates mainbody detection, image classification, and image retrieval to solve most image recognition problems. It can be easily adopted by users to solve small sample and multi-category issues in the field. PaddleDetection provides the ability of target detection, keypoint detection, multi-target tracking, etc., which is convenient for users to locate the points and regions of interest in images, and is widely used in industrial quality inspection, remote sensing image detection, unmanned inspection and other projects.
#### Q1.1.3: What does the parameter momentum mean in the Momentum optimizer?
**A**:
Momentum optimizer is based on SGD optimizer and introduces the concept of "momentum". In the SGD optimizer, the update of the parameter `w` at the time `t+1` can be expressed as
```
w_t+1 = w_t - lr * grad
```
`lr` is the learning rate and `grad` is the gradient of the parameter `w` at this point. With the introduction of momentum, the update of the parameter `w` can be expressed as
```
v_t+1 = m * v_t + lr * grad
w_t+1 = w_t - v_t+1
```
Here `m` is the `momentum`, which is the weighted value of the cumulative momentum, generally taken as `0.9`. And when the value is less than `1`, the earlier the gradient is, the smaller the impact on the current. For example, when the momentum parameter `m` takes `0.9`, the weighted value of the gradient of `t-5` is `0.9 ^ 5 = 0.59049` at time `t`, while the value at time `t-2` is `0.9 ^ 2 = 0.81`. Therefore, it is intuitive that gradient information that is too "far away" is of little significance for the current reference, while "recent" historical gradient information matters more.
[](../../images/faq/momentum.jpeg)
By introducing the concept of momentum, the effect of historical updates is taken into account in parameter updates, thus speeding up the convergence and improving the loss (cost, loss) oscillation caused by the `SGD` optimizer.
#### Q1.1.4: Does PaddleClas have an implementation of the paper `Fixing the train-test resolution discrepancy`?
**A**: Currently, it is not implemented. If needed, you can try to modify the code yourself. In brief, the idea proposed in this paper is to fine-tune the final FC layer of the trained model using a larger resolution as input. Specifically, train the model network on a lower resolution dataset first, then set the parameter `stop_gradient=True ` for the weights of all layers of the network except the final FC layer, and at last fine-tune the network with a larger resolution input.
<a name="1.2"></a>
### 1.2 Backbone Network and Pre-trained Model Library
<a name="1.3"></a>
### 1.3 Image Classification
#### Q1.3.1: Does PaddleClas provide data enhancement for adjusting image brightness, contrast, saturation, hue, etc.?
**A**
PaddleClas provides a variety of data augmentation methods, which can be divided into 3 categories.
1. Image transformation AutoAugment, RandAugment;
2. Image cropping CutOut、RandErasing、HideAndSeek、GridMask
3. Image aliasing Mixup, Cutmix.
Among them, RandAngment provides a variety of random combinations of data augmentation methods, which can meet the needs of brightness, contrast, saturation, hue and other aspects.
<a name="1.4"></a>
### 1.4 General Detection
#### Q1.4.1 Does the mainbody detection only export one subject detection box at a time?
**A**The number of outputs for the main body detection is configurable through the configuration file. In the configuration file, Global.threshold controls the threshold value for detection, so boxes smaller than this threshold are discarded; and Global.max_det_results controls the maximum number of results returned. The two together determine the number of output detection boxes.
#### Q1.4.2 How is the data selected for training the mainbody detection model? Will it harm the accuracy to switch to a smaller model?
**A**
The training data is a randomly selected subset of publicly available datasets such as COCO, Object365, RPC, and LogoDet. We are currently introducing an ultra-lightweight mainbody detection model in version 2.3, which can be found in [Mainbody Detection](../../en/image_recognition_pipeline/mainbody_detection_en.md#2-model-selection).
#### Q1.4.3: Is there any false detections in some scenarios with the current mainbody detection model?
**A**The current mainbody detection model is trained using publicly available datasets such as COCO, Object365, RPC, LogoDet, etc. If the data to be detected is similar to industrial quality inspection and other data with large differences from common categories, it is necessary to fine-tune the training based on the current detection model again.
<a name="1.5"></a>
### 1.5 Image Recognition
#### Q1.5.1 Is `triplet loss` needed for `circle loss` ?
**A**
`circle loss` is a unified form of sample pair learning and classification learning, and `triplet loss` can be added if it is a classification learning.
#### Q1.5.2 Which recognition model is better if not to recognize open source images in all four directions?
**A**
The product recognition model is recommended. For one, the range of products is wider and the probability that the recognized image is a product is higher. For two, the training data of the product recognition model uses 50,000 categories of data, which has better generalization ability and more robust features.
#### Q1.5.3 Why is 512-dimensional vector is finally adopted instead of 1024 or others?
**A**
Vectors with small dimensions should be adopted. 128 or even smaller are practically used to speed up the computation. In general, a dimension of 512 is large enough to adequately represent the features.
<a name="1.6"></a>
### 1.6 Vector Search
#### Q1.6.1 Does the Möbius vector search algorithm currently used by PaddleClas support index.add() similar to the one used by faiss? Also, do I have to train every time I build a new graph? Is the train here to speed up the search or to build similar graphs?
**A**The faiss retrieval module is now supported in the release/2.3 branch and is no longer supported by Möbius, which provides a graph-based algorithm that is similar to the nearest neighbor search and currently supports two types of distance calculation: inner product and L2 distance. However, Möbius does not support the index.add function provided in faiss. So if you need to add content to the search library, you need to rebuild a new index from scratch. The search algorithm internally performs a train-like process each time the index is built, which is different from the train interface provided by faiss. Therefore, if you need the faiss module, you can use the release/2.3 branch, and if you need Möbius, you need to fall back to the release/2.2 branch for now.
#### Q1.6.2: What exactly are the `Query` and `Gallery` configurations used for in the PaddleClas image recognition for Eval configuration file?
**A**:
Both `Query` and `Gallery` are data set configurations, where `Gallery` is used to configure the base library data and `Query` is used to configure the validation set. When performing Eval, the model is first used to forward compute feature vectors on the `Gallery` base library data, which are used to construct the base library, and then the model forward computes feature vectors on the data in the `Query` validation set, and then computes metrics such as recall rate in the base library.
<a name="2"></a>
## 2. Practice
<a name="2.1"></a>
### 2.1 Common Problems in Training and Evaluation
#### Q2.1.1 Where is the `train_log` file in PaddleClas?
**A**`train.log` is stored in the path where the weights are stored.
#### Q2.1.2 Why nan is the output of the model training?
**A** 1. Make sure the pre-trained model is loaded correctly, the easiest way is to add the parameter `-o Arch.pretrained=True`; 2. When fine-tuning the model, the learning rate should not be too large, e.g. 0.001.
#### Q2.1.3 Is it possible to perform frame-by-frame prediction in a video?
**A**Yes, but currently PaddleClas does not support video input. You can try to modify the code of PaddleClas or store the video frame by frame before using PaddleClas.
#### Q2.1.4: In data preprocessing, what setting can be adopted without cropping the input data? Or how to set the size of the crop?
**A**: The data preprocessing operators supported by PaddleClas can be viewed in `ppcls/data/preprocess/__init__.py`, and all supported operators can be configured in the configuration file. The name of the operator needs to be the same as the operator class name, and the parameters need to be the same as the constructor parameters of the corresponding operator class. If you do not need to crop the image, you can remove `CropImage` and `RandCropImage` and replace them with `ResizeImage`, and you can set different resize methods with its parameters by using the `size` parameter to directly scale the image to a fixed size, and using the `resize_short` parameter to maintain the image aspect ratio for scaling. To set the crop size, use the `size` parameter of the `CropImage` operator, or the `size` parameter of the `RandCropImage` operator.
#### Q2.1.5: Why do I get a usage error after PaddlePaddle installation and cannot import any modules under paddle (import paddle.xxx)?
**A**:
You can first test if Paddle is installed correctly by using the following code.
```
import paddle
paddle.utils.install_check.run_check(
```
When installed correctly, the following prompts will be displayed.
```
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
```
Otherwise, the relevant question will prompt out. Also, after installing both the CPU and the GPU version of Paddle, you will need to uninstall both versions and reinstall the required version due to conflicts between the two versions.
#### Q2.1.6: How to save the optimal model during training?
**A**:
PaddleClas saves/updates the following three types of models during training.
1. the latest model (`latest.pdopt`, `latest.pdparams`, `latest.pdstates`), which can be used to resume training when it is unexpectedly interrupted.
2. the best model (`best_model.pdopt`, `best_model.pdparams`, `best_model.pdstates`).
3. breakpoints at the end of an epoch during training (`epoch_xxx.pdopt`, `epoch_xxx.pdparams`, `epoch_xxx.pdstates`). The `Global.save_interval` field in the training profile indicates the save interval for this model. If you make it larger than the total number of epochs, intermediate breakpoint models will no longer be saved.
#### Q2.1.7: How to address the `ERROR: Unexpected segmentation fault encountered in DataLoader workers.` during training?
**A**Try setting the field `num_workers` in the training profile to `0`; try making the field `batch_size` in the training profile smaller; ensure that the dataset format and the dataset path in the profile are correct.
#### Q2.1.8: How to use `Mixup` and `Cutmix` during training?
**A**
- For `Mixup`, please refer to [Mixup](../../../ppcls/configs/ImageNet/DataAugment/ResNet50_ Mixup.yaml#L63-L65); and`Cuxmix`, please refer to [Cuxmix](../../../ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L63-L65).
- The training accuracy (Acc) metric cannot be calculated when using `Mixup` or `Cutmix` for training, so you need to remove the `Metric.Train.TopkAcc` field in the configuration file, please refer to [Metric.Train.TopkAcc](../../../ppcls/configs/ImageNet/DataAugment/ResNet50_Cutmix.yaml#L125-L128).
#### Q2.1.9: What are the fields `Global.pretrain_model` and `Global.checkpoints` used for in the training configuration file yaml?
**A**
- When `fine-tune` is required, the path of the file of pre-training model weights can be configured via the field `Global.pretrain_model`, which usually has the suffix `.pdparams`.
- During training, the training program automatically saves the breakpoint information at the end of each epoch, including the optimizer information `.pdopt` and model weights information `.pdparams`. In the event that the training process is unexpectedly interrupted and needs to be resumed, the breakpoint information file saved during training can be configured via the field `Global.checkpoints`, for example by configuring `checkpoints: . /output/ResNet18/epoch_18` to restore the breakpoint information at the end of 18 epoch training. PaddleClas will automatically load `epoch_18.pdopt` and `epoch_18.pdparams` to continue training from 19 epoch.
<a name="2.2"></a>
### 2.2 Image Classification
#### Q2.2.1 How to distill the small model after pre-training the large model on 500M data and then distill the fintune model on 1M data in SSLD?
**A**The steps are as follows:
1. Obtain the `ResNet50-vd` model based on the distillation of the open-source `ResNeXt101-32x16d-wsl` model from Facebook.
2. Use this `ResNet50-vd` to distill `MobilNetV3` on a 500W dataset.
3. Considering that the distribution of the 500W dataset is not exactly the same as that of the 100W data, this piece of data is finetuned on the 100W data to slightly improve the accuracy.
#### Q2.2.2 nan appears in loss when training SwinTransformer
**A**When training SwinTransformer, please use `Paddle` `2.1.1` or above, and load the pre-trained model we provide. Also, the learning rate should be kept at an appropriate level.
<a name="2.3"></a>
### 2.3 General Detection
#### Q2.3.1 Why are there some images that are detected as the original image?
**A**The mainbody detection model returns the detection frame, but in fact, in order to make the subsequent recognition model more accurate, the original image is also returned along with the detection frame. Subsequently, the original image or the detection frame will be sorted according to its similarity with the images in the library, and the label of the image in the library with the highest similarity will be the label of the recognized image.
#### Q2.3.2
**A**A real-time detection presents high requirements for the detection speed; PP-YOLO is a lightweight target detection model provided by Paddle team, which strikes a good balance of detection speed and accuracy, you can try PP-YOLO for detection. For the use of PP-YOLO, you can refer to [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/README_cn.md).
#### Q2.3.3: For unknown labels, adding gallery dataset can be used for subsequent classification recognition (without training), but if the previous detection model cannot locate and detect the unknown labels, is it still necessary to train the previous detection model?
**A**If the detection model does not perform well on your own dataset, you need to finetune it again on your own detection dataset.
<a name="2.4"></a>
### 2.4 Image Recognition
#### Q2.4.1: Why is `Illegal instruction` reported during the recognition inference?
**A**If you are using the release/2.2 branch, it is recommended to update it to the release/2.3 branch, where we replaced the Möbius search model with the faiss search module, as described in [Vector Search Tutorial](../image_recognition_pipeline/vector_search_en.md). If you still have problems, you can contact us in the WeChat group or raise an issue on GitHub.
#### Q2.4.2: How can recognition models be fine-tuned to train on the basis of pre-trained models?
**A**The fine-tuning training of the recognition model is similar to that of the classification model. The recognition model can be loaded with a pre-trained model of the product, and the training process can be found in [recognition model training](../../models_training/recognition_en.md), and we will continue to refine the documentation.
#### Q2.4.3: Why does it fail to run all mini-batches in each epoch when training metric learning?
**A**When training metric learning, the Sampler used is DistributedRandomIdentitySampler, which does not sample all the images, resulting in each epoch sampling only part of the data, so it is normal that the mini-batch cannot run through the display. This issue has been optimized in the release/2.3 branch, please update to release/2.3 to use it.
#### Q2.4.4: Why do some images have no recognition results?
**A**In the configuration file (e.g. inference_product.yaml), `IndexProcess.score_thres` controls the minimum value of cosine similarity of the recognized image to the image in the library. When the cosine similarity is less than this value, the result will not be printed. You can adjust this value according to your actual data.
<a name="2.5"></a>
### 2.5 Vector Search
#### Q2.5.1: Why is the error `assert text_num >= 2` reported after adding an image to the index?
**A**Make sure that the image path and the image name in data_file.txt is separated by a single table instead of a space.
#### Q2.5.2: Do I need to rebuild the index to add new base data?
**A**Starting from release/2.3 branch, we have replaced the Möbius search model with the faiss search module, which already supports the addition of base data without building the base library, as described in [Vector Search Tutorial](../image_recognition_pipeline/vector_search_en.md).
#### Q2.5.3: How to deal with the reported error clang: error: unsupported option '-fopenmp' when recompiling index.so in Mac?
**A**
If you are using the release/2.2 branch, it is recommended to update it to the release/2.3 branch, where we replaced the Möbius search model with the faiss search module, as described in [Vector Search Tutorial](../image_recognition_pipeline/vector_search_en.md). If you still have problems, you can contact us in the user WeChat group or raise an issue on GitHub.
#### Q2.5.4: How to set the parameter `pq_size` when build searches the base library?
**A**
`pq_size` is a parameter of the PQ search algorithm, which can be simply understood as a "tiered" search algorithm. And `pq_size` is the "capacity" of each tier, so the setting of this parameter will affect the performance. However, in the case that the total data volume of the base library is not too large (less than 10,000), this parameter has little impact on the performance. So for most application scenarios, there is no need to modify this parameter when building the base library. For more details on the PQ search algorithm, see the related [paper](https://lear.inrialpes.fr/pubs/2011/JDS11/jegou_searching_with_quantization.pdf).
<a name="2.6"></a>
### 2.6 Model Inference Deployment
#### Q2.6.1: How to add the parameter of a module that is enabled by hub serving?
**A**See [hub serving parameters](../../../deploy/hubserving/clas/params.py) for more details.
#### Q2.6.2: Why is the result not accurate enough when exporting the inference model for inference deployment?
**A**:
This problem is usually caused by the incorrect loading of the model parameters when exporting. First check the export log for something like the following.
```
UserWarning: Skip loading for ***. *** is not found in the provided dict.
```
If it exists, the model weights were not loaded successfully. Please further check the `Global.pretrained_model` field in the configuration file to see if the path of the model weights file is correctly configured. The suffix of the model weights file is usually `pdparams`, note that the file suffix is not required when configuring this path.
#### Q2.6.3: How to convert the model to `ONNX` format?
**A**
Paddle supports two ways and relies on the `paddle2onnx` tool, which first requires the installation of `paddle2onnx`.
```
pip install paddle2onnx
```
- From inference model to ONNX format model:
Take the `combined` format inference model (containing `.pdmodel` and `.pdiparams` files) exported from the dynamic graph as an example, run the following command to convert the model format:
```
paddle2onnx --model_dir ${model_path} --model_filename ${model_path}/inference.pdmodel --params_filename ${model_path}/inference.pdiparams --save_file ${save_path}/model.onnx --enable_onnx_checker True
```
In the above commands
- `model_dir`: this parameter needs to contain `.pdmodel` and `.pdiparams` files.
- `model_filename`: this parameter is used to specify the path of the `.pdmodel` file under the parameter `model_dir`.
- `params_filename`: this parameter is used to specify the path of the `.pdiparams` file under the parameter `model_dir`.
- `save_file`: this parameter is used to specify the path to the directory where the converted model is saved.
For the conversion of a non-`combined` format inference model exported from a static diagram (usually containing the file `__model__` and multiple parameter files), and more parameter descriptions, please refer to the official documentation of [paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README.md#parameters).
- Exporting ONNX format models directly from the model networking code.
Take the model networking code of dynamic graphs as an example, the model class is a subclass that inherits from `paddle.nn.Layer` and the code is shown below:
```python
import paddle
from paddle.static import InputSpec
class SimpleNet(paddle.nn.Layer):
def __init__(self):
pass
def forward(self, x):
pass
net = SimpleNet()
x_spec = InputSpec(shape=[None, 3, 224, 224], dtype='float32', name='x')
paddle.onnx.export(layer=net, path="./SimpleNet", input_spec=[x_spec])
```
Among them
- `InputSpec()` function is used to describe the signature information of the model input, including the `shape`, `type` and `name` of the input data (can be omitted).
- The `paddle.onnx.export()` function needs to specify the model grouping object `net`, the save path of the exported model `save_path`, and the description of the model's input data `input_spec`.
Note that the `paddlepaddle` `2.0.0` or above should be adopted.See [paddle.onnx.export](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/onnx/) for more details on the parameters of the `paddle.onnx.export()` function.

@ -0,0 +1,291 @@
# FAQ
## Before You Read
- We collect some frequently asked questions in issues and user groups since PaddleClas is open-sourced and provide brief answers, aiming to give some reference for the majority to save you from twists and turns.
- There are many talents in the field of image classification, recognition and retrieval with quickly updated models and papers, and the answers here mainly rely on our limited project practice, so it is not possible to cover all facets. We sincerely hope that the man of insight will help to supplement and correct the content, thanks a lot.
## Catalogue
- [1. 30 Questions About Image Classification](#1)
- [1.1 Basic Knowledge](#1.1)
- [1.2 Model Training](#1.2)
- [1.3 Data](#1.3)
- [1.4 Model Inference and Prediction](#1.4)
- [2. Application of PaddleClas](#2)
<a name="1"></a>
## 1. 30 Questions About Image Classification
<a name="1.1"></a>
### 1.1 Basic Knowledge
- Q: How many classification metrics are commonly used in the field of image classification?
- A:
- For a single-label image classification (containing only 1 category and background), the evaluation metrics are Accuracy, Precision, Recall, F-score, etc. If TP(True Positive) means predicting positive class as positive, FP(False Positive) means predicting negative class as positive, TN( True Negative) means the negative class is predicted to be negative, and FN(False Negative) means the positive class is predicted to be negative. Then Accuracy=(TP + TN) / NUM, Precision=TP /(TP + FP), Recall=TP /(TP + FN).
- For the image classification problem with the number of classes greater than 1, the evaluation metrics are Accuary and Class-wise Accuracy. Accuary indicates the percentage of the number of images correctly predicted by all classes to the total number of images; Class-wise Accuracy is obtained by calculating the Accuracy for each class of images and then averaging the Accuracy of all classes.
> >
- Q: 怎样根据自己的任务选择合适的模型进行训练How to choose the right training model?
- A: If you want to deploy on the server with a high requirement for accuracy but not model storage size or prediction speed, then it is recommended to use ResNet_vd, Res2Net_vd, DenseNet, Xception, etc., which are suitable for server-side models. If you want to deploy on the mobile side, then it is recommended to use MobileNetV3 and GhostNet. Meanwhile, we suggest you refer to the speed-accuracy metrics chart in [Model Library](../models/models_intro_en.md) when choosing models.
> >
- Q: How to initialize the parameters and what kind of initialization can speed up the convergence of the model?
- A: It is well known that the initialization of parameters can affect the final performance of the model. In general, if the target dataset is not very large, it is recommended to use the pre-trained model obtained by training ImageNet-1k for initialization. If the network is designed manually or there are no pre-trained weights based on ImageNet-1k training, you can use Xavier initialization or MSRA initialization, where the former is proposed for Sigmoid function, which is less friendly to RELU function. The deeper the network is, the smaller the variance of each layer input, the harder the network is to train. So when more RELU activation functions are used in the neural network, MSRA initialization is a better choice.
> >
- Q: What are the better solutions to the problem of parameter redundancy in deep neural networks?
- A: There are several major approaches to compressing models and reducing the model parameter redundancy, such as pruning, quantization, and knowledge distillation. Model pruning refers to removing relatively unimportant weights from the weight matrix and then fine-tuning the network again. Model quantization refers to a technique that converts floating-point computation into low-ratio specific-point computation, such as 8-bit, 4-bit, etc., which can effectively reduce the computational intensity, parameter size, and memory consumption of the model. Knowledge distillation refers to the use of a teacher model to guide a student model to learn a specific task, ensuring that the small model has a great performance improvement or even obtains similar accuracy metrics as the large model with the same number of parameters.
> >
- Q: How to choose the right classification model as a backbone network in other tasks, such as target detection, image segmentation, key point detection, etc.?
- A:
Without considering the speed, it is most recommended to use pre-training models and backbone networks with higher accuracy. A series of SSLD knowledge distillation pre-training models are open-sourced in PaddleClas, such as ResNet50_vd_ssld, Res2Net200_vd_26w_4s_ssld, etc., which excel in both model accuracy and speed. For specific tasks, such as image segmentation or key point detection, which require higher image resolution, it is recommended to use neural network models such as HRNet that can take into account both network depth and resolution. And PaddleClas also provides HRNet SSLD distillation series pre-training models including HRNet_W18_C_ssld, HRNet_W48_C_ssld, etc., which have very high accuracy. You can use these models and the backbone network to improve your own model accuracy on other tasks.
> >
- Q: What is the attention mechanism? What are the common methods of it?
- A: The Attention Mechanism (AM) originated from the study of human vision. Using the mechanism on computer vision tasks can effectively capture the useful regions in the images and thus improve the overall network performance. Currently, the most commonly used ones are [SE block](https://arxiv.org/abs/1709.01507), [SK-block](https://arxiv.org/abs/1903.06586), [Non-local block](https://arxiv. org/abs/1711.07971), [GC block](https://arxiv.org/abs/1904.11492), [CBAM](https://arxiv.org/abs/1807.06521), etc. The core idea is to learn the importance of feature maps in different regions or different channels, so that the network can pay more attention to the regions of salience.
<a name="1.2"></a>
### 1.2 Model Training
> >
- Q: What will happen if a model with 10 million classes is trained during the image classification with deep convolutional networks?
- A: Because of the large number of parameters in the FC layer, the memory/video memory/model storage usage will increase significantly; the model convergence speed will also be slower. In this case, it is recommended to add a layer of FC with a smaller dimension before the last FC layer, which can drastically reduce the storage size of the model.
> >
- Q: What are the possible reasons if the model converges poorly during the training process?
- A: There are several points that can be investigated: (1) The data annotation should be checked to ensure that there are no problems with the labeling of the training and validation sets. (2) Try to adjust the learning rate (initially by a factor of 10). A learning rate that is too large (training oscillation) or too small (slow convergence) may lead to poor convergence. (3) Huge amount of data and an overly small model may prevent it from learning all the features of the data. (4) See if normalization is used in the data preprocessing process. It may be slower without normalization operation. (5) If the amount of data is relatively small, you can try to load the pre-trained model based on ImageNet-1k dataset provided in PaddleClas, which can greatly improve the training convergence speed. (6) There is a long tail problem in the dataset, you can refer to the [solution to the long tail problem of data](#long_tail).
> >
- Q: How to choose the right optimizer when training image classification tasks?
- A: Since the emergence of deep learning, there has been a lot of research on optimizers, which aim to minimize the loss function to find the right weights for a given task. Currently, the main optimizers used in the industry are SGD, RMSProp, Adam, AdaDelt, etc. Among them, since the SGD optimizer with momentum is widely used in academia and industry (only for classification tasks), most of the models we published also adopt this optimizer to achieve gradient descent of the loss function. It has two disadvantages, one is the slow convergence speed, and the other is the reliance on experiences of the initial learning rate setting. However, if the initial learning rate is set properly with a sufficient number of iterations, the optimizer will also stand out among many other optimizers, obtaining higher accuracy on the validation set. Some optimizers with adaptive learning rates, such as Adam and RMSProp, tend to converge fast, but the final convergence accuracy will be slightly worse. If you pursue faster convergence speed, we recommend using these adaptive learning rate optimizers, and SGD optimizers with momentum for higher convergence accuracy.
- Q: What are the current mainstream learning rate decay strategies? How to choose?
- A: The learning rate is the speed at which the hyperparameters of the network weights are adjusted by the gradient of the loss function. The lower the learning rate, the slower the loss function will change. While using a low learning rate ensures that no local minimal values are missed, it also means that it takes longer to converge, especially if trapped in a plateau region. Throughout the whole training process, we cannot adopt the same learning rate to update the weights, otherwise, the optimal point cannot be reached, So we need to adjust the learning rate during the training. In the initial stage of training, since the weights are in a random initialization state and the loss function decreases fast, a larger learning rate can be set. And in the later stage of training, since the weights are close to the optimal value, a larger learning rate cannot further find the optimal value, so a smaller learning rate needs is a better choice. As for the learning rate decay strategy, many researchers or practitioners use piecewise_decay (step_decay), which is a stepwise decay learning rate. In addition, there are also other methods proposed by researchers, such as polynomial_decay, exponential_ decay, cosine_decay, etc. Among them, cosine_decay requires no adjustment of hyperparameters and has higher robustness, thus emerging as the preferred learning rate decay method to improve model accuracy. The learning rates of cosine_decay and piecewise_decay are shown in the following figure. It is easy to observe that cosine_decay keeps a large learning rate throughout the training, so it is slow in convergence, but its final effect is better than peicewise_decay.
![](../../images/models/lr_decay.jpeg)
> >
- Q: What is the Warmup strategy? Where is it applied?
- A: The warmup strategy, which, as the name implies, is a warm-up for the learning rate with no direct adoption of maximum learning rate at the beginning of training, but to train the network with a gradually increasing rate, and then decay the learning rate when it peaks. When training a neural network with a large batch_size, it is recommended to use the warmup strategy. Experiments show that warmup can steadily improve the accuracy of the model when the batch_size is large. For example, when training MobileNetV3, we set the epoch in warmup to 5 by default, i.e., first increase the learning rate from 0 to the maximum value with 5 epochs, and then conduct the corresponding decay of the learning rate.
> >
- Q: What is `batch size`How to choose the appropriate `batch size` during training?
- A: `batch size` is an important hyperparameter in neural networks training, whose value determines how much data is fed into the neural network for training at a time. According to the paper [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677), when the value of `batch size` is linearly related to the value of learning rate, the convergence accuracy is almost unaffected. When training ImageNet-1k data, most of the neural networks choose an initial learning rate of 0.1 and a `batch size` of 256. Therefore, depending on the actual model size and video memory, the learning rate can be set to 0.1*k and the batch_size to 256*k. This setting can also be used as the initial parameter to further adjust the learning rate parameter and obtain better performance in real tasks.
> >
- Q: What is weight_decayHow to choose
- A: Overfitting is a common term in machine learning, which is simply understood as a model that performs well on training data but less satisfactory on test data. In image classification, there is also the problem of overfitting, and many regularization methods are proposed to avoid it, among which weight_decay is one of the widely used ways. When using SGD optimizer, weight_decay is equivalent to adding L2 regularization after the final loss function, which makes the weights of the network tend to choose smaller values, so eventually, the parameter values in the whole network tend to be more towards 0, and the generalization performance of the model is improved accordingly. In the implementation of major deep learning frameworks, this value means the coefficient before the L2 regularization, which is called L2Decay in the PaddlePaddle framework. The larger the coefficient is, the stronger the added regularization is, and the more the model tends to be underfitted. When training ImageNet, most networks set the value of this parameter to 1e-4, and in some smaller networks such as the MobileNet series network, the value is set between 1e-5 and 4e-5 to avoid the underfitting. Of course, the setting of this value is also related to specific datasets. When the dataset of the task is large, the network itself tends to be under-fitted and the value should be reduced appropriately, and when it is small, the network itself tends to be over-fitted and the value should be increased. The following table shows the accuracy of MobileNetV1_x0_25 on ImageNet-1k using different l2_decay. Since MobileNetV1_x0_25 is a relatively small network, too large a l2_decay will tend to underfit the network, so 3e-5 is a better choice in this network compared to 1e-4.
| Model | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
| ----------------- | -------- | --------------- | -------------- |
| MobileNetV1_x0_25 | 1e-4 | 43.79%/67.61% | 50.41%/74.70% |
| MobileNetV1_x0_25 | 3e-5 | 47.38%/70.83% | 51.45%/75.45% |
> >
- Q: What does label smoothing (label_smoothing) refer to? What is the effect? What kind of scenarios does it usually apply to?
- A: Label_smoothing is a regularization method in deep learning, whose full name is Label Smoothing Regularization (LSR). In the traditional classification task, the loss function is calculated by the cross-entropy of the real one hot label and the output of the neural network, while label_smoothing is a label smoothing of the real one hot label, so that the label learned by the network is no longer a hard label, but a soft label with a probability value, where the probability at the position corresponding to the category is the largest and others small. See the paper[2] for detailed calculation methods. In label_smoothing, the epsilon parameter describes the degree of label softening, the larger the value, the smaller the label probability value of the label vector after label smoothing, the smoother the label, and vice versa. The value is usually set to 0.1 in experiments training ImageNet-1k, and there is a steady increase in accuracy for models of the ResNet50 size and above after using label_smooting. The following table shows the accuracy metrics of ResNet50_vd before and after using label_smoothing. At the same time, since label_smoohing can be regarded as a regularization method, the accuracy improvement is not obvious or even decreases on a relatively small model. The following table shows the accuracy metrics of ResNet18 before and after using label_smoothing on ImageNet-1k. It is clear that the accuracy drops after using label_smoothing.
| Model | Use_label_smoothing | Test acc1 |
| ----------- | ------------------- | --------- |
| ResNet50_vd | 0 | 77.9% |
| ResNet50_vd | 1 | 78.4% |
| ResNet18 | 0 | 71.0% |
| ResNet18 | 1 | 70.8% |
> >
- Q: How to determine the tuning strategy by the accuracy or loss of the training and validation sets during training?
- A: In the process of training a network, the accuracy of the training set and validation set are usually printed for each epoch, which portrays the performance of the model on both datasets. Generally speaking, it is good to have a comparable accuracy or a slightly higher accuracy in the training set than in the validation set. If we find that the accuracy of the training set is much higher than the validation set, it means that the training set is overfitted and we need to add more regularity, such as increasing the value of L2Decay, adding more data augmentation strategies, introducing label_smoothing strategies, etc. If we find that the accuracy of the training set is lower than the validation set, it means that the training set is probably underfitted, and the regularization effect should be weakened during the training, such as reducing the value of L2Decay, decreasing the data augmentation methods, increasing the area of the crop area, weakening the image stretching, removing label_smoothing, etc.
> >
- Q: How to improve the accuracy of my own dataset by pre-training the model?
- A: At this stage, it has become a common practice in the image recognition field to load pre-trained models to train their own tasks, which can often improve the accuracy of a particular task compared to training from random initialization. In general, the pre-training model widely used in the industry is obtained by training the ImageNet-1k dataset of 1.28 million images of 1000 classes. The fc layer weights of this pre-training model are a matrix of k*1000, where k is the number of neurons before the fc layer, and it is not necessary to load the fc layer weights when loading the pre-training weights. In terms of the learning rate, if your dataset is particularly small (e.g., less than 1,000), we recommend you to adopt a small initial learning rate, e.g., 0.001 (batch_size:256, the same below), so as not to corrupt the pre-training weights with a larger learning rate. If your training dataset is relatively large (>100,000), we suggest you try a larger initial learning rate, such as 0.01 or above.
<a name="1.3"></a>
### 1.3 Data
> >
- Q: What are the general steps involved in the data pre-processing for image classification?
- A: When training ResNet50 on ImageNet-1k dataset, an image is fed into the network, and there are several steps: image decoding, random cropping, random horizontal flipping, normalization, data rearrangement, group batching and feeding into the network. Image decoding refers to reading the image file into memory; random cropping refers to randomly stretching and cropping the read image to an image with the length and width of 224 ; random horizontal flipping refers to flipping the cropped image horizontally with a probability of 0.5; normalization refers to centering the data of each channel of the image by de-meaning, so that the data conforms to the `N(0,1)` normal distribution as much as possible; data rearrangement refers to changing the data from `[224,224,3]` format to `[3,224,224]`; and group batching refers to forming a batch of multiple images and feeding them into the network for training.
> >
- Q: How does random-crop affect the performance of small model training?
- A: In the standard preprocessing of ImageNet-1k data, the random_crop function defines two values, scale and ratio, which respectively determine the size of the image crop and the degree of image stretching, where the default value of the former is 0.08-1 (lower_scale-upper_scale), and the latter is 3/4-4/3 (lower_ratio-upper_ratio). In very small networks, this kind of data augmentation can lead to network underfitting and decreased accuracy. To the end, the data augmentation can be made weaker by increasing the crop area of the image or decreasing the stretching of the image. Weaker image transformation can be achieved by increasing the value of lower_scale or reducing the difference between lower_ratio and upper_scale, respectively. The following table shows the accuracy of training MobileNetV2_x0_25 with different lower_scale, and we can see that the training accuracy and verification accuracy are improved by increasing the crop area of the images.
| Model | Range of Scale | Train_acc1/acc5 | Test_acc1/acc5 |
| ----------------- | -------------- | --------------- | -------------- |
| MobileNetV2_x0_25 | [0.08,1] | 50.36%/72.98% | 52.35%/75.65% |
| MobileNetV2_x0_25 | [0.2,1] | 54.39%/77.08% | 53.18%/76.14% |
> >
- Q: What are the common data augmentation methods currently available to increase the richness of training samples when the amount of data is insufficient?
- A: PaddleClas classifies data augmentation methods into three categories, which are image transformation, image cropping and image aliasing. Image transformation mainly includes AutoAugment and RandAugment, image cropping contains CutOut, RandErasing, HideAndSeek and GridMask, and image aliasing comprises Mixup and Cutmix. More detailed introduction to data augmentation can be found in the chapter of [Data Augmentation ](../algorithm_introduction/DataAugmentation_en.md).
> >
- Q: For image classification scenarios where occlusion is common, what data augmentation methods should be used to improve the accuracy of the model?
- A: During the training, you can try to adopt cropping data augmentations including CutOut, RandErasing, HideAndSeek and GridMask on the training set, so that the model can learn not only the significant regions but also the non-significant regions, thus better performing the recognition task.
> >
- Q: What data augmentation methods should be used to improve model accuracy in the case of complex color transformations?
- A: Consider using the data augmentation strategies of AutoAugment or RandAugment, both of which include rich color transformations such as sharpening and histogram equalization, allowing the model to be more robust to these transformations during the training process.
> >
- Q: How do Mixup and Cutmix work? Why are they effective methods of data augmentation?
- A: Mixup generates a new image by linearly overlaying two images, and the corresponding labels also undertake the same process for training, while Cutmix crops a random region of interest (ROI) from an image and overlays the corresponding region in the current image, and the labels are linearly overlaid in proportion to the image area. They actually generate different samples and labels from the training set and for the learning of the network, thus enriching the samples.
> >
- Q: What is the size of the training dataset for an image classification task that does not require high accuracy?
- A: The amount of the training data is related to the complexity of the problem to be solved. The greater the difficulty and the higher the accuracy requirement, the larger the dataset needs to be. And it is a universal rule that the more training data the better the result in practice. Of course, in general, 10-20 images per category with pre-trained models can guarantee the basic classification effect; or at least 100-200 images per category without pre-training models.
> >
<a name="long_tail"></a>
- Q: What are the common methods currently used for datasets with long-tailed distributions?
- A:(1) the categories with fewer data can be resampled to increase the probability of their occurrence; (2) the loss can be modified to increase the loss weight of images in categories corresponding to fewer images; (3) the method of transfer learning can be borrowed to learn generic knowledge from common categories and then migrate to the categories with fewer samples.
<a name="1.4"></a>
### 1.4 Model Inference and Prediction
> >
- Q: How to deal with the poor recognition performance when the original image is taken for classification with only a small part of the image being the foreground object of interest?
- A: A mainbody detection model can be added before classification to detect the foreground objects, which can greatly improve the final recognition results. If time cost is not a concern, multi-crop can also be used to fuse all the predictions to determine the final category.
> >
- Q: What are the currently recommended model inference methods?
- A: After the model is trained, it is recommended to use the exported inference model to make inferences based on the Paddle inference engine, which currently supports python inference and cpp inference. If you want to deploy the inference model based on service, it is recommended to use the PaddleServing.
> >
- Q: What are the appropriate inference methods to further improve the model accuracy after training?
- A:(1) A larger inference scale can be used, e.g., 224 for training, then 288 or 320 for inference, which will directly bring about a 0.5% improvement in accuracy. (2) Test Time Augmentation (TTA) can be used to create multiple copies of the test set by rotating, flipping, color transforming, and so on, and then fuse all the prediction results, which can greatly improve its accuracy and robustness. (3) Of course, a multi-model fusion strategy can also be adopted to fuse the prediction results of multiple models for the same images.
> >
- Q: How to choose the best model for the fusion of multiple models?
- A: Without considering the inference speed, models with the highest possible are recommended; it is also suggested to choose models with different structures or series for fusion. For example, the model fusion results of ResNet50_vd and Xception65 tend to be better than those of ResNet50_vd and ResNet101_vd with similar accuracy.
> >
- Q: What are the common acceleration methods when using a fixed model for inference?
- A: (1) Using a GPU with better performance; (2) increasing the batch size; (3) using TenorRT and FP16 half-precision floating-point methods.
<a name="2"></a>
## 2. Application of PaddleClas
> >
- Q: Why can't I import parameters even though I have specified the address of the folder where the pre-trained model is located during evaluation and inference?
- A: When loading a pretrained model, you need to specify the prefix of it. For example, if the folder where the pretrained model parameters are located is `output/ResNet50_vd/19` and the name of the pretrained model parameters is `output/ResNet50_vd/19/ppcls.pdparams`, then the `pretrained_model ` parameter needs to be specified as `output/ResNet50_vd/19/ppcls`, and PaddleClas will automatically complete the `.pdparams` suffix.
> >
- Q: Why the final accuracy is always about 0.3% lower than the official one when evaluating the `EfficientNetB0_small` model?
- A: The `EfficientNet` series network uses `cubic interpolation` for resize (the interpolation value of the resize parameter is set to 2), while other models are None by default, so the interpolation value of resize needs to be explicitly specified during training and evaluation. Specifically, you can refer to the ResizeImage parameter in the preprocessing process in the following configuration.
```
Eval:
dataset:
name: ImageNetDataset
image_root: ./dataset/ILSVRC2012/
cls_label_path: ./dataset/ILSVRC2012/val_list.txt
transform_ops:
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
resize_short: 256
interpolation: 2
- CropImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
```
> >
- Q: Why `TypeError: __init__() missing 1 required positional argument: 'sync_cycle'` is reported when using visualdl under python2
- A: Currently visualdl only supports running under python3 with a required version of 2.0 or higher. If visualdl is not the right version, you can install it as follows: `pip3 install visualdl -i https://mirror.baidu.com/pypi/simple`
> >
- Q: Why is it that the inference speed of a single image by ResNet50_vd is much lower than the benchmark provided by the official website while the CPU is much faster than GPU?
- A: The model inference needs to be initialized, and it is time-consuming. Therefore, when counting the inference speed, we need to run a batch of images, remove the inference time of the first few images, and then count the average time.GPU is slower than CPU to test a single image because the initialization of GPU is much slower than CPU.
> >
- Q: Can grayscale maps be used for model training?
- A: The grayscale image can also be used for model training, but the input shape of the model needs to be modified to `[1, 224, 224]`, and the data augmentation also needs to be adapted. However, for better use of the PaddleClas code, it is recommended to adapt the grayscale image to a 3-channel image for training (RGB channels have equal pixel values).
> >
- Q: How to train the model on windows or cpu?
- A: You can refer to [Getting Started Tutorial](../models_training/classification_en.md) for detailed tutorials on model training, evaluation and inference in Linux , Windows, CPU, and other environments.
> >
- Q: How to use label smoothing in model training?
- A: This can be set in the `Loss` field in the configuration file as follows. `epsilon=0.1` means set the value to 0.1, if the `epsilon` field is not set, then `label smoothing` will not be used.
```
Loss:
Train:
- CELoss:
weight: 1.0
epsilon: 0.1
```
> >
- Q: Is the 10W class image classification pre-training model provided by PaddleClas available for model inference?
- A: This 10W class image classification pre-training model does not provide parameters for the fc fully connected layer, which cannot be used for model inference but is available for model fine-tuning at present.
> >
- Q: Why is `Error: Pass tensorrt_subgraph_pass has not been registered` reported When using `deploy/python/predict_cls.py` for model prediction?
- A: If you want to use TensorRT for model prediction and inference, you need to install or compile PaddlePaddle with TensorRT by yourself. For Linux, Windows, macOS users, you can refer to [download inference library](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html). If there is no required version, you need to compile and install it locally, which is detailed in [source code compilation](https://paddleinference.paddlepaddle.org.cn/user_guides/source_compile.html).
> >
- Q: How to train with Automatic Mixed Precision (AMP) during training?
- A: You can refer to [ResNet50_fp16.yaml](../../../ppcls/configs/ImageNet/ResNet/ResNet50_fp16.yaml). Specifically, if you want your configuration file to support automatic mixed precision during model training, you can add the following information to the file.
```
# mixed precision training
AMP:
scale_loss: 128.0
use_dynamic_loss_scaling: True
use_pure_fp16: &use_pure_fp16 True
```

@ -0,0 +1,10 @@
faq_series
================================
.. toctree::
:maxdepth: 2
faq_2021_s2_en.md
faq_2021_s1_en.md
faq_2020_s1_en.md
faq_selected_30_en.md

@ -0,0 +1,182 @@
# Feature Extraction
## Catalogue
- [1.Introduction](#1)
- [2.Network Structure](#2)
- [3.General Recognition Models](#3)
- [4.Customized Feature Extraction](#4)
- [4.1 Data Preparation](#4.1)
- [4.2 Model Training](#4.2)
- [4.3 Model Evaluation](#4.3)
- [4.4 Model Inference](#4.4)
<a name="1"></a>
## 1.Introduction
Feature extraction plays a key role in image recognition, which serves to transform the input image into a fixed dimensional feature vector for subsequent [vector search](./vector_search_en.md). Good features boast great similarity preservation, i.e., in the feature space, pairs of images with high similarity should have higher feature similarity (closer together), and pairs of images with low similarity should have less feature similarity (further apart). [Deep Metric Learning](../algorithm_introduction/metric_learning_en.md) is applied to explore how to obtain features with high representational power through deep learning.
<a name="2"></a>
## 2.Network Structure
In order to customize the image recognition task flexibly, the whole network is divided into Backbone, Neck, Head, and Loss. The figure below illustrates the overall structure:
![img](../../images/feature_extraction_framework_en.png)
Functions of the above modules :
- **Backbone**: Specifies the backbone network to be used. It is worth noting that the ImageNet-based pre-training model provided by PaddleClas has an output of 1000 for the last layer, which demands for customization according to the required feature dimensions.
- **Neck**: Used for feature augmentation and feature dimension transformation. Here it can be a simple Linear Layer for feature dimension transformation, or a more complex FPN structure for feature augmentation.
- **Head**: Used to transform features into logits. In addition to the common Fc Layer, cosmargin, arcmargin, circlemargin and other modules are all available choices.
- **Loss**: Specifies the Loss function to be used. It is designed as a combined form to facilitate the combination of Classification Loss and Pair_wise Loss.
<a name="3"></a>
## 3.General Recognition Models
In PP-Shitu, we have [PP_LCNet_x2_5](../models/PP-LCNet.md) as the backbone network, Linear Layer for Neck, [ArcMargin](../../../ppcls/arch/gears/arcmargin.py) for Head, and CELoss for Loss. See the details in [General Recognition_configuration files](../../../ppcls/configs/GeneralRecognition/). The involved training data covers the following seven public datasets:
| Datasets | Data Size | Class Number | Scenarios | URL |
| ------------ | --------- | ------------ | ------------------ | ------------------------------------------------------------ |
| Aliproduct | 2498771 | 50030 | Commodities | [URL](https://retailvisionworkshop.github.io/recognition_challenge_2020/) |
| GLDv2 | 1580470 | 81313 | Landmarks | [URL](https://github.com/cvdfoundation/google-landmark) |
| VeRI-Wild | 277797 | 30671 | Vehicle | [URL](https://github.com/PKU-IMRE/VERI-Wild) |
| LogoDet-3K | 155427 | 3000 | Logo | [URL](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| iCartoonFace | 389678 | 5013 | Cartoon Characters | [URL](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) |
| SOP | 59551 | 11318 | Commodities | [URL](https://cvgl.stanford.edu/projects/lifted_struct/) |
| Inshop | 25882 | 3997 | Commodities | [URL](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) |
| **Total** | **5M** | **185K** | ---- | ---- |
The results are shown in the table below:
| Model | Aliproduct | VeRI-Wild | LogoDet-3K | iCartoonFace | SOP | Inshop | Latency(ms) |
| ------------- | ---------- | --------- | ---------- | ------------ | ----- | ------ | ----------- |
| PP-LCNet-2.5x | 0.839 | 0.888 | 0.861 | 0.841 | 0.793 | 0.892 | 5.0 |
- Evaluation metric: `Recall@1`
- CPU of the speed evaluation machine: `Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`.
- Evaluation conditions for the speed metric: MKLDNN enabled, number of threads set to 10
- Address of the pre-training model: [General recognition pre-training model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams)
<a name="4"></a>
## 4.Customized Feature Extraction
Customized feature extraction refers to retraining the feature extraction model based on one's own task. It consists of four main steps: 1) data preparation, 2) model training, 3) model evaluation, and 4) model inference.
<a name="4.1"></a>
### 4.1 Data Preparation
To start with, customize your dataset based on the task (See [Format description](../data_preparation/recognition_dataset_en.md#1) for the dataset format). Before initiating the model training, modify the data-related content in the configuration files, including the address of the dataset and the class number. The corresponding locations in configuration files are shown below:
```
Head:
name: ArcMargin
embedding_size: 512
class_num: 185341 #Number of class
```
```
Train:
dataset:
name: ImageNetDataset
image_root: ./dataset/ #The directory where the train dataset is located
cls_label_path: ./dataset/train_reg_all_data.txt #The address of label file for train dataset
```
```
Query:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/. #The directory where the query dataset is located
cls_label_path: ./dataset/Aliproduct/val_list.txt. #The address of label file for query dataset
```
```
Gallery:
dataset:
name: VeriWild
image_root: ./dataset/Aliproduct/ #The directory where the gallery dataset is located
cls_label_path: ./dataset/Aliproduct/val_list.txt. #The address of label file for gallery dataset
```
<a name="4.2"></a>
### 4.2 Model Training
- Single machine single card training
```
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
```
- Single machine multi card training
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
```
**Note:** The configuration file adopts `online evaluation` by default, if you want to speed up the training and remove `online evaluation`, just add `-o eval_during_train=False` after the above command. After training, the final model files `latest`, `best_model` and the training log file `train.log` will be generated under the directory output. Among them, `best_model` is utilized to store the best model under the current evaluation metrics while`latest` is adopted to store the latest generated model, making it convenient to resume the training from where it was interrupted.
- Resumption of Training
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" tools/train.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-o Global.checkpoint="output/RecModel/latest"
```
<a name="4.3"></a>
### 4.3 Model Evaluation
- Single Card Evaluation
```
export CUDA_VISIBLE_DEVICES=0
python tools/eval.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-o Global.pretrained_model="output/RecModel/best_model"
```
- Multi Card Evaluation
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch \
--gpus="0,1,2,3" tools/eval.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-o Global.pretrained_model="output/RecModel/best_model"
```
**Recommendation:** It is suggested to employ multi-card evaluation, which can quickly obtain the feature set of the overall dataset using multi-card parallel computing, accelerating the evaluation process.
<a name="4.4"></a>
### 4.4 Model Inference
Two steps are included in the inference: 1)exporting the inference model; 2)obtaining the feature vector.
#### 4.4.1 Export Inference Model
```
python tools/export_model.py \
-c ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml \
-o Global.pretrained_model="output/RecModel/best_model"
```
The generated inference models are under the directory `inference`, which comprises three files, namely, `inference.pdmodel`、`inference.pdiparams`、`inference.pdiparams.info`. Among them, `inference.pdmodel` serves to store the structure of inference model while `inference.pdiparams` and `inference.pdiparams.info` are mobilized to store model-related parameters.
#### 4.4.2 Obtain Feature Vector
```
cd deploy
python python/predict_rec.py \
-c configs/inference_rec.yaml \
-o Global.rec_inference_model_dir="../inference"
```
The output format of the obtained features is shown in the figure below:![img](../../images/feature_extraction_output.png)
In practical use, however, business operations require more than simply obtaining features. To further perform image recognition by feature retrieval, please refer to the document [vector search](./vector_search_en.md).

@ -0,0 +1,9 @@
image_recognition_pipeline
================================
.. toctree::
:maxdepth: 2
mainbody_detection_en.md
feature_extraction_en.md
vector_search_en.md

@ -0,0 +1,242 @@
# Mainbody Detection
The mainbody detection technology is currently a widely used detection technology, which refers to a whole image recognition process of identifying the coordinate position of one or more objects and then cropping down the corresponding area for recognition. Mainbody detection is the first step of the recognition task, which can effectively improve the recognition accuracy.
This tutorial will introduce the technology from three aspects, namely, the datasets, model selection and model training.
## Catalogue
- [1. Dataset](#1)
- [2. Model Selection](#2)
- [2.1 Lightweight Mainbody Detection Model](#2.1)
- [2.2 Server-side Mainbody Detection Model](#2.2)
- [3. Model Training](#3)
- [3.1 Prepare For the Environment](#3.1)
- [3.2 Prepare For the Dataset](#3.2)
- [3.3 Configuration Files](#3.3)
- [3.4 Begin the Training Process](#3.4)
- [3.5 Model Prediction](#3.5)
- [3.6 Model Export and Inference Deployment](#3.6)
<a name="1"></a>
## 1. Dataset
The datasets we used for mainbody detection tasks are shown in the following table.
| Dataset | Image Number | Image Number Used in Mainbody Detection | Scenarios | Dataset Link |
| ------------ | ------------ | --------------------------------------- | ----------------- | ---------------------------------------------------------- |
| Objects365 | 170W | 6k | General Scenarios | [Link](https://www.objects365.org/overview.html) |
| COCO2017 | 12W | 5k | General Scenarios | [Link](https://cocodataset.org/) |
| iCartoonFace | 2k | 2k | Cartoon Face | [Link](https://github.com/luxiangju-PersonAI/iCartoonFace) |
| LogoDet-3k | 3k | 2k | Logo | [Link](https://github.com/Wangjing1551/LogoDet-3K-Dataset) |
| RPC | 3k | 3k | Product | [Link](https://rpc-dataset.github.io/) |
In the actual training process, all datasets are mixed together. Categories of all the labeled boxes are modified as `foreground`, and the detection model we trained only contains one category (`foreground`).
<a name="2"></a>
## 2. Model Selection
There are a wide variety of object detection methods, such as the commonly used two-stage detectors (FasterRCNN series, etc.), single-stage detectors (YOLO, SSD, etc.), anchor-free detectors (FCOS, etc.) and so on. PaddleDetection has its self-developed PP-YOLO models for server-side scenarios and PicoDet models for end-side scenarios (CPU and mobile), which all take the lead in the area.
Build on the studies above, PaddleClas provides lightweight and server-side main body detection models for end-side scenarios and server-side scenarios respectively. The table below presents the average mAP of the 5 datasets and the comparison of their model sizes and inference speed.
| Model | Model Structure | Download Link of Pre-trained Model | Download Link of Inference Model | mAP | Size of Inference Model (MB) | Inference Time per Image (preprocessing excluded)(ms) |
| ------------------------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ----- | ---------------------------- | ----------------------------------------------------- |
| Lightweight Mainbody Detection Model | PicoDet | [Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_pretrained.pdparams) | [Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | 40.1% | 30.1 | 29.8 |
| Server-side Mainbody Detection Model | PP-YOLOv2 | [Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams) | [Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | 42.5% | 210.5 | 466.6 |
Notes:
- Detailed information of the CPU of the speed evaluation machine`Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz`.The speed indicator is the testing result when mkldnn is on and the number of threads is set to 10.
- Mainbody detection has a time-consuming preprocessing procedure, with an average time of about 40 to 55 ms per image in the above machine. Therefore, it is not included in the inference time.
<a name="2.1"></a>
### 2.1 Lightweight Mainbody Detection Model
PicoDet, introduced by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection), is an object detection algorithm applied to CPU or mobile-side scenarios. It integrates the following optimization algorithm.
- [ATSS](https://arxiv.org/abs/1912.02424)
- [Generalized Focal Loss](https://arxiv.org/abs/2006.04388)
- Cosine learning rate decay
- Cycle-EMA
- Lightweight detection head
For more details of optimized PicoDet and benchmark, you can refer to [Tutorials of PicoDet Models](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/README.md).
To balance the detection speed and effects in lightweight mainbody detection tasks, we adopt PPLCNet_x2_5 as the backbone of the model and revise the image scale for training and inference to 640x640, with the rest configured the same as [picodet_m_shufflenetv2_416_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/configs/picodet/picodet_m_shufflenetv2_416_coco.yml). The final detection model is obtained after the training of customized mainbody detection datasets.
<a name="2.2"></a>
### 2.2 Server-side Mainbody Detection Model
PP-YOLO is proposed by [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). It greatly optimizes the yolov3 model from multiple perspectives such as backbone, data augmentation, regularization strategy, loss function, and post-processing. It reaches the state of the art in terms of "speed-precision". The optimization strategy is as follows.
- Better backbone: ResNet50vd-DCN
- Larger training batch size of 8 GPUs and mini-batch size of 24 on each GPU, which is corresponding to learning rate and the number of iterations.
- [Drop Block](https://arxiv.org/abs/1810.12890)
- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
- [IoU Loss](https://arxiv.org/pdf/1902.09630.pdf)
- [Grid Sensitive](https://arxiv.org/abs/2004.10934)
- [Matrix NMS](https://arxiv.org/pdf/2003.10152.pdf)
- [CoordConv](https://arxiv.org/abs/1807.03247)
- [Spatial Pyramid Pooling](https://arxiv.org/abs/1406.4729)
- Better Pre-trained Model
For more information about PP-YOLO, you can refer to [PP-YOLO tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release%2F2.1/configs/ppyolo/README.md).
In the mainbody detection task, we use `ResNet50vd-DCN` as our backbone for better performance. The config file is [ppyolov2_r50vd_dcn_365e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml), in which the dataset path is modified to the customized mainbody detection dataset. The final detection model can be downloaded [here](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar).
<a name="3"></a>
## 3. Model Training
This section mainly talks about how to train your own mainbody detection model using PaddleDetection on your own datasets.
<a name="3.1"></a>
### 3.1 Prepare For the Environment
Download PaddleDetection and install requirements.
```shell
cd <path/to/clone/PaddleDetection>
git clone https://github.com/PaddlePaddle/PaddleDetection.git
cd PaddleDetection
# install requirements
pip install -r requirements.txt
```
For more installation tutorials, please refer to [Installation Tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/docs/tutorials/INSTALL.md)
<a name="3.2"></a>
### 3.2 Prepare For the Dataset
For customized dataset, you should convert it to COCO format. Please refer to [Customized Dataset Tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/static/docs/tutorials/Custom_DataSet.md) to build your own datasets with COCO format.
In mainbody detection task, all the objects belong to foregroud. Therefore, `category_id` of all the objects in the annotation file should be modified to 1. And the `categories` map should be modified as follows, in which just class `foregroud` is included.
```
[{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}]
```
<a name="3.3"></a>
### 3.3 Configuration Files
We use `configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml` to train the model, mode details are as follows.
[![img](../../images/det/PaddleDetection_config.png)](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/det/PaddleDetection_config.png)
`ppyolov2_r50vd_dcn_365e_coco.yml` depends on other configuration files, their meanings are as follows.
```
coco_detection.ymlpath of train/eval/test dataset.
runtime.ymlpublic runtime parameters, including whethre to use GPU, epoch number for checkpoint saving, etc.
optimizer_365e.ymllearning rate and optimizer.
ppyolov2_r50vd_dcn.ymlmodel architecture and backbone.
ppyolov2_reader.ymltrain/eval/test reader, such as batch size, the number of concurrently loaded sub-processes, etc., and includes post-read pre-processing operations, such as resize, data enhancement, etc.
```
In mainbody detection task, you need to modify `num_classes` in `datasets/coco_detection.yml` to 1 (only `foreground` is included), while modify the paths of the training and testing datasets to those of the customized datasets.
In addition, the above files can also be modified according to real situations, for example, if the video memory is overflowing, the batch size and learning rate can be reduced in equal proportion.
<a name="3.4"></a>
### 3.4 Begin the Training Process
PaddleDetection supports many ways of training process.
- Training using single GPU
```
# not needed for windows and Mac
export CUDA_VISIBLE_DEVICES=0
python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml
```
- Training using multiple GPUs
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval
```
--eval: evaluation while training
- (**Recommend**) Model finetune If you want to finetune the trained model in PaddleClas on your own datasets, you can run the following command.
```
export CUDA_VISIBLE_DEVICES=0
# assign pretrain_weights, load the general mainbody-detection pretrained model
python tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml -o pretrain_weights=https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/ppyolov2_r50vd_dcn_mainbody_v1.0_pretrained.pdparams
```
- Resume training
you can use `-r` to load checkpoints and resume training.
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --eval -r output/ppyolov2_r50vd_dcn_365e_coco/10000
```
Note: If `Out of memory error` occurs, you can try to decrease `batch_size` in `ppyolov2_reader.yml` while reducing learning rate in equal proportion.
<a name="3.5"></a>
### 3.5 Model Prediction
Use the following command to finish the prediction process.
```
export CUDA_VISIBLE_DEVICES=0
python tools/infer.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --infer_img=your_image_path.jpg --output_dir=infer_output/ --draw_threshold=0.5 -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final
```
`--draw_threshold` is an optional parameter. According to NMS calculation, different thresholds will produce different results. `keep_top_k` indicates the maximum number of output targets, with a default value of 100 that can be modified according to their actual situation.
<a name="3.6"></a>
### 3.6 Model Export and Inference Deployment
Use the following to export the inference model
```
python tools/export_model.py -c configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml --output_dir=./inference -o weights=output/ppyolov2_r50vd_dcn_365e_coco/model_final.pdparams
```
The inference model will be saved under the directory `inference/ppyolov2_r50vd_dcn_365e_coco`, which contains`infer_cfg.yml` (optional for mainbody detection), `model.pdiparams`, `model.pdiparams.info`, `model.pdmodel`.
Note Inference model that `PaddleDetection` exports is named `model.xxx`if you want to keep it consistent with PaddleClasyou can rename `model.xxx` to `inference.xxx` for subsequent inference deployment of mainbody detection.
For more model export tutorials, please refer to [EXPORT_MODEL](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.1/deploy/EXPORT_MODEL.md).
The final directory contains `inference/ppyolov2_r50vd_dcn_365e_coco`, `inference.pdiparams`, `inference.pdiparams.info`, and `inference.pdmodel`among which`inference.pdiparams` refers to saved weight files of the inference model while `inference.pdmodel` stands for structural files.
After exporting the model, the path of the detection model can be changed to the inference model path to complete the prediction task.
Take product recognition as an exampleyou can modify the field `Global.det_inference_model_dir` in its config file [inference_product.yaml](../../../deploy/configs/inference_product.yaml) to the directory of exported inference model, and then finish the detection and recognition of the product with reference to [Quick Start for Image Recognition](../quick_start/quick_start_recognition_en.md).
## FAQ
#### Q: Is it compatible with other mainbody detection models
- A: Yes, but the current preprocessing process only supports PicoDet and YOLO models, so it is recommended to use these two for training. If you want to use other models such as Faster RCNN, you need to revise the logic of preprocessing in accordance with that of PaddleDetection. You are welcomed to resort to Github Issue or WeChat group for any needs or questions.
#### Q: Can I modify the prediction scale of mainbody detection?
- A: Yes, but there are 2 things that require attention
- The mainbody detection model provided in PaddleClas is trained based on `640x640` resolution, so this is also the default value of prediction process. The accuracy will be reduced if other resolutions are used.
- When exporting the model, it is recommended to modify the resolution of the exported model to keep it consistent with the prediction process.

@ -0,0 +1,120 @@
# Vector Search
Vector search finds wide applications in image recognition and image retrieval. It aims to obtain the similarity ranking for a given query vector by performing a similarity or distance calculation of feature vectors with all the vectors to be queried in an established vector library. In the image recognition system, [Faiss](https://github.com/facebookresearch/faiss) is adopted for corresponding support, please check [the official website of Faiss](https://github.com/facebookresearch/faiss) for more details. The main advantages of `Faiss` can be generalized as the following:
- Great adaptability: support Windows, Linux, and MacOS systems
- Easy installation: support `python` interface and direct installation with `pip`
- Rich algorithms: support a variety of search algorithms to cover different scenarios
- Support both CPU and GPU, which accelerates the search process
It is worth noting that the current version of `PaddleClas` **only uses CPU for vector retrieval** for the moment in pursuit of better adaptability.
[![img](../../images/structure.jpg)](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/structure.jpg)
As shown in the figure above, two parts constitute the vector search in the whole `PP-ShiTu` system.
- The green part: the establishment of search libraries for the search query, while providing functions such as adding and deleting images.
- The blue part: the search function, i.e., given the feature vector of a picture and return the label of similar images in the library.
This document mainly introduces the installation of the search module in PaddleClas, the adopted search algorithms, the library building process, and the parameters in the relevant configuration files.
------
## Catalogue
- [1. Installation of the Search Library](#1)
- [2. Search Algorithms](#2)
- [3. Introduction of and Configuration Files](#3)
- [3.1 Parameters of Library Building and Configuration Files](#3.1)
- [3.2 Parameters of Search Configuration Files](#3.2)
<a name="1"></a>
## 1. Installation of the Search Library
`Faiss` can be installed as follows:
```
pip install faiss-cpu==1.7.1post2
```
If the above cannot be properly used, please `uninstall` and then `install` again, especially when you are using`windows`.
<a name="2"></a>
## 2. Search Algorithms
Currently, the search module in `PaddleClas` supports the following three search algorithms:
- **HNSW32**: A graph indexing method boasts high retrieval accuracy and fast speed. However, the feature library only supports the function of adding images, not deleting image features. (Default method)
- **IVF**: An inverted index search method with fast speed but slightly lower precision. The feature library supports functions of adding and deleting image features.
- **FLAT**: A violent search algorithm presenting the highest precision, but slower retrieval speed in face of large data volume. The feature library supports functions of adding and deleting image features.
Each search algorithm can find its right place in different scenarios. `HNSW32`, as the default method, strikes a balance between accuracy and speed, see its detailed introduction in the [official document](https://github.com/facebookresearch/faiss/wiki).
<a name="3"></a>
## 3. Introduction of Configuration Files
Configuration files involving the search module are under `deploy/configs/`, where `build_*.yaml` is related to building the feature library, and `inference_*.yaml` is the inference file for retrieval or classification.
<a name="3.1"></a>
### 3.1 Parameters of Library Building and Configuration Files
The building of the library is detailed as follows:
```
# Enter deploy directory
cd deploy
# Change the yaml file to the specific one you need
python python/build_gallery.py -c configs/build_***.yaml
```
The `yaml` file is configured as follows for library building, please make necessary corrections to fit the real operation. The construction will extract the features of the images under `image_root` according to the image list in `data_file` and store them under `index_dir` for subsequent search.
The `data_file` stores the path and label of the image file, with each line presenting the format `image_path label`. The intervals are spaced by the `delimiter` parameter in the `yaml` file.
The specific model parameters for feature extraction can be found in the `yaml` file.
```
# indexing engine config
IndexProcess:
index_method: "HNSW32" # supported: HNSW32, IVF, Flat
index_dir: "./recognition_demo_data_v1.1/gallery_product/index"
image_root: "./recognition_demo_data_v1.1/gallery_product/"
data_file: "./recognition_demo_data_v1.1/gallery_product/data_file.txt"
index_operation: "new" # suported: "append", "remove", "new"
delimiter: "\t"
dist_type: "IP"
embedding_size: 512
```
- **index_method**: the search algorithm. It currently supports three, HNSW32, IVF, and Flat.
- **index_dir**: the folder where the built feature library is stored.
- **image_root**: the location of the folder where the annotated images needed to build the feature library are stored.
- **data_file**: the data list of the annotated images needed to build the feature library, the format of each line: relative_path label.
- **index_operation**: the operation to build a library: `new` for initiating an operation, `append` for adding the image feature of data_file to the feature library, `remove` for deleting the image of data_file from the feature library.
- **delimiter**: delimiter for each line in **data_file**
- **dist_type**: the method of similarity calculation adopted in feature matching. For example, Inner Product(`IP`) and Euclidean distance(`L2`).
- **embedding_size**: feature dimensionality
<a name="3.2"></a>
### 3.2 Parameters of Search Configuration Files
To integrate the search into the overall `PP-ShiTu` process, please refer to `The Introduction of PP-ShiTu Image Recognition System` in [README](../../../README_en.md). Please check the [Quick Start for Image Recognition](../quick_start/quick_start_recognition_en.md) for the specific operation of the search.
The search part is configured as follows. Please refer to `deploy/configs/inference_*.yaml` for the complete version.
```
IndexProcess:
index_dir: "./recognition_demo_data_v1.1/gallery_logo/index/"
return_k: 5
score_thres: 0.5
```
The following are new parameters other than those of the library building configuration file:
- `return_k`: `k` results are returned
- `score_thres`: the threshold for retrieval and match

@ -0,0 +1,18 @@
欢迎使用PaddleClas图像分类库
================================
.. toctree::
:maxdepth: 2
models_training/index
introduction/index
image_recognition_pipeline/index
others/index
faq_series/index
data_preparation/index
installation/index
models/index
advanced_tutorials/index
algorithm_introduction/index
inference_deployment/index
quick_start/index

@ -0,0 +1,298 @@
# Server-side C++ inference
This tutorial will introduce the detailed steps of deploying the PaddleClas classification model on the server side. The deployment of the recognition model will be supported in the near future. Please look forward to it.
---
## Catalogue
- [1. Prepare the environment](#1)
- [1.1 Compile OpenCV](#1.1)
- [1.2 Compile or download the Paddle Inference Library](#1.2)
- [1.2.1 Compile from the source code](#1.2.1)
- [1.2.2 Direct download and installation](#1.2.2)
- [2. Compile](#2)
- [2.1 Compile PaddleClas C++ inference demo](#2.1)
- [2.2 Compile config lib and cls lib](#2.2)
- [3. Run](#3)
- [3.1 Prepare inference model](#3.1)
- [3.2 Run demo](#3.2)
<a name="1"></a>
## 1. Prepare the environment
### Environment
- Linux, docker is recommended.
- Windows, compilation based on `Visual Studio 2019 Community` is supported. In addition, you can refer to [How to use PaddleDetection to make a complete project](https://zhuanlan.zhihu.com/p/145446681) to compile by generating the `sln solution`.
- This document mainly introduces the compilation and inference of PaddleClas using C++ in Linux environment.
<a name="1.1"></a>
### 1.1 Compile opencv
* First of all, you need to download the source code compiled package in the Linux environment from the opencv official website. Taking opencv3.4.7 as an example, the download and uncompress command are as follows.
```
wget https://github.com/opencv/opencv/archive/3.4.7.tar.gz
tar -xf 3.4.7.tar.gz
```
Finally, you can see the folder of `opencv-3.4.7/` in the current directory.
* Compile opencv, the opencv source path (`root_path`) and installation path (`install_path`) should be set by yourself. Among them, `root_path` is the downloaded opencv source code path, and `install_path` is the installation path of opencv. In this case, the opencv source is `./opencv-3.4.7`.
```shell
cd ./opencv-3.4.7
export root_path=$PWD
export install_path=${root_path}/opencv3
```
* After entering the opencv source code path, you can compile it in the following way.
```shell
rm -rf build
mkdir build
cd build
cmake .. \
-DCMAKE_INSTALL_PREFIX=${install_path} \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DWITH_IPP=OFF \
-DBUILD_IPP_IW=OFF \
-DWITH_LAPACK=OFF \
-DWITH_EIGEN=OFF \
-DCMAKE_INSTALL_LIBDIR=lib64 \
-DWITH_ZLIB=ON \
-DBUILD_ZLIB=ON \
-DWITH_JPEG=ON \
-DBUILD_JPEG=ON \
-DWITH_PNG=ON \
-DBUILD_PNG=ON \
-DWITH_TIFF=ON \
-DBUILD_TIFF=ON
make -j
make install
```
* After `make install` is completed, the opencv header file and library file will be generated in this folder for later PaddleClas source code compilation.
Take opencv3.4.7 for example, the final file structure under the opencv installation path is as follows. **NOTICE**:The following file structure may be different for different Versions of Opencv.
```
opencv3/
|-- bin
|-- include
|-- lib64
|-- share
```
<a name="1.2"></a>
### 1.2 Compile or download the Paddle Inference Library
* There are 2 ways to obtain the Paddle Inference Library, described in detail below.
<a name="1.2.1"></a>
#### 1.2.1 Compile from the source code
* If you want to get the latest Paddle Inference Library features, you can download the latest code from Paddle GitHub repository and compile the inference library from the source code.
* You can refer to [Paddle Inference Library](https://www.paddlepaddle.org.cn/documentation/docs/en/develop/guides/05_inference_deployment/inference/build_and_install_lib_en.html#build-from-source-code) to get the Paddle source code from github, and then compile To generate the latest inference library. The method of using git to access the code is as follows.
```shell
git clone https://github.com/PaddlePaddle/Paddle.git
```
* After entering the Paddle directory, the compilation method is as follows.
```shell
rm -rf build
mkdir build
cd build
cmake .. \
-DWITH_CONTRIB=OFF \
-DWITH_MKL=ON \
-DWITH_MKLDNN=ON \
-DWITH_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_INFERENCE_API_TEST=OFF \
-DON_INFER=ON \
-DWITH_PYTHON=ON
make -j
make inference_lib_dist
```
For more compilation parameter options, please refer to the official website of the Paddle C++ inference library:[https://www.paddlepaddle.org.cn/documentation/docs/en/develop/guides/05_inference_deployment/inference/build_and_install_lib_en.html#build-from-source-code](https://www.paddlepaddle.org.cn/documentation/docs/en/develop/guides/05_inference_deployment/inference/build_and_install_lib_en.html#build-from-source-code).
* After the compilation process, you can see the following files in the folder of `build/paddle_inference_install_dir/`.
```
build/paddle_inference_install_dir/
|-- CMakeCache.txt
|-- paddle
|-- third_party
|-- version.txt
```
Among them, `paddle` is the Paddle library required for C++ prediction later, and `version.txt` contains the version information of the current inference library.
<a name="1.2.2"></a>
#### 1.2.2 Direct download and installation
* Different cuda versions of the Linux inference library (based on GCC 4.8.2) are provided on the
[Paddle Inference Library official website](https://www.paddlepaddle.org.cn/documentation/docs/en/develop/guides/05_inference_deployment/inference/build_and_install_lib_en.html). You can view and select the appropriate version of the inference library on the official website.
* Please select the `develop` version.
* After downloading, use the following method to uncompress.
```
tar -xf paddle_inference.tgz
```
Finally you can see the following files in the folder of `paddle_inference/`.
<a name="2"></a>
## 2. Compile
<a name="2.1"></a>
### 2.1 Compile PaddleClas C++ inference demo
* The compilation commands are as follows. The addresses of Paddle C++ inference library, opencv and other Dependencies need to be replaced with the actual addresses on your own machines.
```shell
sh tools/build.sh
```
Specifically, the content in `tools/build.sh` is as follows.
```shell
OPENCV_DIR=your_opencv_dir
LIB_DIR=your_paddle_inference_dir
CUDA_LIB_DIR=your_cuda_lib_dir
CUDNN_LIB_DIR=your_cudnn_lib_dir
TENSORRT_DIR=your_tensorrt_lib_dir
BUILD_DIR=build
rm -rf ${BUILD_DIR}
mkdir ${BUILD_DIR}
cd ${BUILD_DIR}
cmake .. \
-DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=ON \
-DDEMO_NAME=clas_system \
-DWITH_GPU=OFF \
-DWITH_STATIC_LIB=OFF \
-DWITH_TENSORRT=OFF \
-DTENSORRT_DIR=${TENSORRT_DIR} \
-DOPENCV_DIR=${OPENCV_DIR} \
-DCUDNN_LIB=${CUDNN_LIB_DIR} \
-DCUDA_LIB=${CUDA_LIB_DIR} \
make -j
```
In the above parameters of command:
* `OPENCV_DIR` is the opencv installation path;
* `LIB_DIR` is the download (`paddle_inference` folder) or the generated Paddle Inference Library path (`build/paddle_inference_install_dir` folder);
* `CUDA_LIB_DIR` is the cuda library file path, in docker; it is `/usr/local/cuda/lib64`;
* `CUDNN_LIB_DIR` is the cudnn library file path, in docker it is `/usr/lib/x86_64-linux-gnu/`.
* `TENSORRT_DIR` is the tensorrt library file pathin dokcer it is `/usr/local/TensorRT6-cuda10.0-cudnn7/`TensorRT is just enabled for GPU.
After the compilation is completed, an executable file named `clas_system` will be generated in the `build` folder.
<a name="2.2"></a>
### 2.2 Compile config lib and cls lib
In addition to compiling the demo directly, you can also compile only config lib and cls lib by running the following command:
```shell
sh tools/build_lib.sh
```
The contents of the above command are as follows:
```shell
OpenCV_DIR=path/to/opencv
PADDLE_LIB_DIR=path/to/paddle
BUILD_DIR=./lib/build
rm -rf ${BUILD_DIR}
mkdir ${BUILD_DIR}
cd ${BUILD_DIR}
cmake .. \
-DOpenCV_DIR=${OpenCV_DIR} \
-DPADDLE_LIB=${PADDLE_LIB_DIR} \
-DCMP_STATIC=ON \
make
```
The specific description of each compilation option is as follows:
* `DOpenCV_DIR`: The directory to the OpenCV compilation library. In this example, it is `opencv-3.4.7/opencv3/share/OpenCV`. Note that there needs to be a `OpenCVConfig.cmake` file under this directory;
* `DPADDLE_LIB`: The directory to the paddle prediction library which generally is the path of `paddle_inference` downloaded and decompressed or compiled, such as `build/paddle_inference_install_dir`. Note that there should be two subdirectories `paddle` and `third_party` in this directory;
* `DCMP_STATIC`: Whether to compile config lib and cls lib into static link library (`.a`). The default is `ON`. If you need to compile into dynamic link library (`.so`), please set it to `OFF`.
After executing the above commands, the dynamic link libraries (`libcls.so` and `libconfig.so`) or static link libraries (`cls.a` and `libconfig.a`) of config lib and cls lib will be generated in the directory. In the [2.1 Compile PaddleClas C++ inference demo](#2.1), you can specify the compilation option `DCLS_LIB` and `DCONFIG_LIB` to the path of the existing link library of `cls lib` and `config lib`, which can also be used for development.
<a name="3"></a>
## 3. Run the demo
<a name="3.1"></a>
### 3.1 Prepare the inference model
* You can refer to [Model inference](../../../tools/export_model.py)export the inference model. After the model is exported, assuming it is placed in the `inference` directory, the directory structure is as follows.
```
inference/
|--cls_infer.pdmodel
|--cls_infer.pdiparams
```
**NOTICE**: Among them, `cls_infer.pdmodel` file stores the model structure information and the `cls_infer.pdiparams` file stores the model parameter information.The paths of the two files need to correspond to the parameters of `cls_model_path` and `cls_params_path` in the configuration file `tools/config.txt`.
<a name="3.2"></a>
### 3.2 Run demo
First, please modify the `tools/config.txt` and `tools/run.sh`.
* Some key words in `tools/config.txt` is as follows.
* use_gpu: Whether to use GPU.
* gpu_id: GPU id.
* gpu_memGPU memory.
* cpu_math_library_num_threadsNumber of thread for math library acceleration.
* use_mkldnnWhether to use mkldnn.
* use_tensorrt: Whether to use tensorRT.
* use_fp16Whether to use Float16 (half precision), it is just enabled when use_tensorrt is set as 1.
* cls_model_path: Model path of inference model.
* cls_params_path: Params path of inference model.
* resize_short_sizeShort side length of the image after resize.
* crop_sizeImage size after center crop.
* You could modify `tools/run.sh`(`./build/clas_system ./tools/config.txt ./docs/imgs/ILSVRC2012_val_00000666.JPEG`):
* ./build/clas_system: the path of executable file compiled;
* ./tools/config.txt: the path of config;
* ./docs/imgs/ILSVRC2012_val_00000666.JPEG: the path of image file to be predicted.
* Then execute the following command to complete the classification of an image.
```shell
sh tools/run.sh
```
* The prediction results will be shown on the screen, which is as follows.
![](../../images/inference_deployment/cpp_infer_result.png)
* In the above results,`class id` represents the id corresponding to the category with the highest confidence, and `score` represents the probability that the image belongs to that category.

@ -0,0 +1,103 @@
# Export model
PaddlePaddle supports exporting inference model for deployment. Compared with training, inference model files store network weights and network structures persistently, and PaddlePaddle supports more fast prediction engine loading inference model to deployment.
---
## Catalogue
- [1. Environmental preparation](#1)
- [2. Export classification model](#2)
- [3. Export mainbody detection model](#3)
- [4. Export recognition model](#4)
- [5. Parameter description](#5)
<a name="1"></a>
## 1. Environmental preparation
First, refer to the [Installing PaddlePaddle](../installation/install_paddle_en.md) and the [Installing PaddleClas](../installation/install_paddleclas_en.md) to prepare environment.
<a name="2"></a>
## 2. Export classification model
Change the working directory to PaddleClas:
```shell
cd /path/to/PaddleClas
```
Taking the classification model ResNet50_vd as an example, download the pre-trained model:
```shell
wget -P ./cls_pretrain/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/legendary_models/ResNet50_vd_pretrained.pdparams
```
The above model weights is trained by ResNet50_vd model on ImageNet1k dataset and training configuration file is `ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml`. To export the inference model, just run the following command:
```shell
python tools/export_model.py
-c ./ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml \
-o Global.pretrained_model=./cls_pretrain/ResNet50_vd_pretrained \
-o Global.save_inference_dir=./deploy/models/class_ResNet50_vd_ImageNet_infer
```
<a name="3"></a>
## 3. Export mainbody detection model
About exporting mainbody detection model in details, please refer[mainbody detection](../image_recognition_pipeline/mainbody_detection_en.md).
<a name="4"></a>
## 4. Export recognition model
Change the working directory to PaddleClas:
```shell
cd /path/to/PaddleClas
```
Take the feature extraction model in products recognition as an example, download the pretrained model:
```shell
wget -P ./product_pretrain/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained.pdparams
```
The above model weights file is trained by ResNet50_vd on AliProduct dataset, and the training configuration file is `ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml`. The command to export inference model is as follow:
```shell
python3 tools/export_model.py \
-c ./ppcls/configs/Products/ResNet50_vd_Aliproduct.yaml \
-o Global.pretrained_model=./product_pretrain/product_ResNet50_vd_Aliproduct_v1.0_pretrained \
-o Global.save_inference_dir=./deploy/models/product_ResNet50_vd_aliproduct_v1.0_infer
```
Notice, the inference model exported by above command is truncated on embedding layer, so the output of the model is n-dimensional embedding feature.
<a name="5"></a>
## 5. Parameter description
In the above model export command, the configuration file used must be the same as the training configuration file. The following fields in the configuration file are used to configure exporting model parameters.
* `Global.image_shape`To specify the input data size of the model, which does not contain the batch dimension;
* `Global.save_inference_dir`To specify directory of saving inference model files exported;
* `Global.pretrained_model`To specify the path of model weight file saved during training. This path does not need to contain the suffix `.pdparams` of model weight file;
The exporting model command will generate the following three files:
* `inference.pdmodel`To store model network structure information;
* `inference.pdiparams`To store model network weight information;
* `inference.pdiparams.info`To store the parameter information of the model, which can be ignored in the classification model and recognition model;
The inference model exported is used to deployment by using prediction engine. You can refer the following docs according to different deployment modes / platforms
* [Python inference](./python_deploy_en.md)
* [C++ inference](./cpp_deploy_en.md)(Only support classification)
* [Python Whl inference](./whl_deploy_en.md)(Only support classification)
* [PaddleHub Serving inference](./paddle_hub_serving_deploy_en.md)(Only support classification)
* [PaddleServing inference](./paddle_serving_deploy_en.md)
* [PaddleLite inference](./paddle_lite_deploy_en.md)(Only support classification)

@ -0,0 +1,19 @@
inference_deployment
================================
.. toctree::
:maxdepth: 2
export_model_en.md
python_deploy_en.md
cpp_deploy_en.md
paddle_serving_deploy_en.md
paddle_hub_serving_deploy_en.md
paddle_lite_deploy_en.md
whl_deploy_en.md

@ -0,0 +1,236 @@
# Service deployment based on PaddleHub Serving
PaddleClas supports rapid service deployment through Paddlehub. At present, it supports the deployment of image classification. Please look forward to the deployment of image recognition.
---
## Catalogue
- [1. Introduction](#1)
- [2. Prepare the environment](#2)
- [3. Download inference model](#3)
- [4. Install Service Module](#4)
- [5. Start service](#5)
- [5.1 Start with command line parameters](#5.1)
- [5.2 Start with configuration file](#5.2)
- [6. Send prediction requests](#6)
- [7. User defined service module modification](#7)
<a name="1"></a>
## 1. Introduction
HubServing service pack contains 3 files, the directory is as follows:
```
hubserving/clas/
└─ __init__.py Empty file, required
└─ config.json Configuration file, optional, passed in as a parameter when using configuration to start the service
└─ module.py Main module file, required, contains the complete logic of the service
└─ params.py Parameter file, required, including parameters such as model path, pre- and post-processing parameters
```
<a name="2"></a>
## 2. Prepare the environment
```shell
# Install version 2.0 of PaddleHub
pip3 install paddlehub==2.1.0 --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simple
```
<a name="3"></a>
## 3. Download inference model
Before installing the service module, you need to prepare the inference model and put it in the correct path. The default model path is:
* Model structure file: `PaddleClas/inference/inference.pdmodel`
* Model parameters file: `PaddleClas/inference/inference.pdiparams`
**Notice**:
* The model file path can be viewed and modified in `PaddleClas/deploy/hubserving/clas/params.py`.
* It should be noted that the prefix of model structure file and model parameters file must be `inference`.
* More models provided by PaddleClas can be obtained from the [model library](../models/models_intro_en.md). You can also use models trained by yourself.
<a name="4"></a>
## 4. Install Service Module
* On Linux platform, the examples are as follows.
```shell
cd PaddleClas/deploy
hub install hubserving/clas/
```
* On Windows platform, the examples are as follows.
```shell
cd PaddleClas\deploy
hub install hubserving\clas\
```
<a name="5"></a>
## 5. Start service
<a name="5.1"></a>
### 5.1 Start with command line parameters
This method only supports CPU. Command as follow:
```shell
$ hub serving start --modules Module1==Version1 \
--port XXXX \
--use_multiprocess \
--workers \
```
**parameters**
|parameters|usage|
|-|-|
|--modules/-m|PaddleHub Serving pre-installed model, listed in the form of multiple Module==Version key-value pairs<br>*`When Version is not specified, the latest version is selected by default`*|
|--port/-p|Service port, default is 8866|
|--use_multiprocess|Enable concurrent mode, the default is single-process mode, this mode is recommended for multi-core CPU machines<br>*`Windows operating system only supports single-process mode`*|
|--workers|The number of concurrent tasks specified in concurrent mode, the default is `2*cpu_count-1`, where `cpu_count` is the number of CPU cores|
For example, start service:
```shell
hub serving start -m clas_system
```
This completes the deployment of a service API, using the default port number 8866.
<a name="5.2"></a>
### 5.2 Start with configuration file
This method supports CPU and GPU. Command as follow:
```shell
hub serving start --config/-c config.json
```
Wherein, the format of `config.json` is as follows:
```json
{
"modules_info": {
"clas_system": {
"init_args": {
"version": "1.0.0",
"use_gpu": true,
"enable_mkldnn": false
},
"predict_args": {
}
}
},
"port": 8866,
"use_multiprocess": false,
"workers": 2
}
```
- The configurable parameters in `init_args` are consistent with the `_initialize` function interface in `module.py`. Among them,
- when `use_gpu` is `true`, it means that the GPU is used to start the service.
- when `enable_mkldnn` is `true`, it means that use MKL-DNN to accelerate.
- The configurable parameters in `predict_args` are consistent with the `predict` function interface in `module.py`.
**Note:**
- When using the configuration file to start the service, other parameters will be ignored.
- If you use GPU prediction (that is, `use_gpu` is set to `true`), you need to set the environment variable CUDA_VISIBLE_DEVICES before starting the service, such as: ```export CUDA_VISIBLE_DEVICES=0```, otherwise you do not need to set it.
- **`use_gpu` and `use_multiprocess` cannot be `true` at the same time.**
- **When both `use_gpu` and `enable_mkldnn` are set to `true` at the same time, GPU is used to run and `enable_mkldnn` will be ignored.**
For example, use GPU card No. 3 to start the 2-stage series service:
```shell
cd PaddleClas/deploy
export CUDA_VISIBLE_DEVICES=3
hub serving start -c hubserving/clas/config.json
```
<a name="6"></a>
## 6. Send prediction requests
After the service starting, you can use the following command to send a prediction request to obtain the prediction result:
```shell
cd PaddleClas/deploy
python hubserving/test_hubserving.py server_url image_path
```
Two required parameters need to be passed to the script:
- **server_url**: service addressformat of which is
`http://[ip_address]:[port]/predict/[module_name]`
- **image_path**: Test image path, can be a single image path or an image directory path
- **batch_size**: [**Optional**] batch_size. Default by `1`.
- **resize_short**: [**Optional**] In preprocessing, resize according to short size. Default by `256`
- **crop_size**: [**Optional**] In preprocessing, centor crop size. Default by `224`
- **normalize**: [**Optional**] In preprocessing, whether to do `normalize`. Default by `True`
- **to_chw**: [**Optional**] In preprocessing, whether to transpose to `CHW`. Default by `True`
**Notice**:
If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `--resize_short=384`, `--crop_size=384`.
**Eg.**
```shell
python hubserving/test_hubserving.py --server_url http://127.0.0.1:8866/predict/clas_system --image_file ./hubserving/ILSVRC2012_val_00006666.JPEG --batch_size 8
```
The returned result is a list, including the `top_k`'s classification results, corresponding scores and the time cost of prediction, details as follows.
```
list: The returned results
└─ list: The result of first picture
└─ list: The top-k classification results, sorted in descending order of score
└─ list: The scores corresponding to the top-k classification results, sorted in descending order of score
└─ float: The time cost of predicting the picture, unit second
```
**Note** If you need to add, delete or modify the returned fields, you can modify the corresponding module. For the details, refer to the user-defined modification service module in the next section.
<a name="7"></a>
## 7. User defined service module modification
If you need to modify the service logic, the following steps are generally required:
1. Stop service
```shell
hub serving stop --port/-p XXXX
```
2. Modify the code in the corresponding files, like `module.py` and `params.py`, according to the actual needs. You need re-install(hub install hubserving/clas/) and re-deploy after modifing `module.py`.
After modifying and installing and before deploying, you can use `python hubserving/clas/module.py` to test the installed service module.
For example, if you need to replace the model used by the deployed service, you need to modify model path parameters `cfg.model_file` and `cfg.params_file` in `params.py`. Of course, other related parameters may need to be modified at the same time. Please modify and debug according to the actual situation.
3. Uninstall old service module
```shell
hub uninstall clas_system
```
4. Install modified service module
```shell
hub install hubserving/clas/
```
5. Restart service
```shell
hub serving start -m clas_system
```
**Note**:
Common parameters can be modified in params.py:
* Directory of model files(include model structure file and model parameters file):
```python
"inference_model_dir":
```
* The number of Top-k results returned during post-processing:
```python
'topk':
```
* Mapping file corresponding to label and class ID during post-processing:
```python
'class_id_map_file':
```
In order to avoid unnecessary delay and be able to predict in batch, the preprocessing (include resize, crop and other) is completed in the client, so modify [test_hubserving.py](../../../deploy/hubserving/test_hubserving.py#L35-L52) if necessary.

@ -0,0 +1,270 @@
# Tutorial of PaddleClas Mobile Deployment
This tutorial will introduce how to use [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) to deploy PaddleClas models on mobile phones.
Paddle-Lite is a lightweight inference engine for PaddlePaddle. It provides efficient inference capabilities for mobile phones and IoTs, and extensively integrates cross-platform hardware to provide lightweight deployment solutions for mobile-side deployment issues.
If you only want to test speed, please refer to [The tutorial of Paddle-Lite mobile-side benchmark test](../extension/paddle_mobile_inference_en.md).
---
## Catalogue
- [1. Preparation](#1)
- [1.1 Build Paddle-Lite library](#1.1)
- [1.2 Download inference library for Android or iOS](#1.2)
- [2. Start running](#2)
- [2.1 Inference Model Optimization](#2.1)
- [2.1.1 [RECOMMEND] Use pip to install Paddle-Lite and optimize model](#2.1.1)
- [2.1.2 Compile Paddle-Lite to generate opt tool](#2.1.2)
- [2.1.3 Demo of get the optimized model](#2.1.3)
- [2.2 Run optimized model on Phone](#2.2)
- [3. FAQ](#3)
<a name="1"></a>
## 1. Preparation
PaddeLite currently supports the following platforms:
- Computer (for compiling Paddle-Lite)
- Mobile phone (arm7 or arm8)
<a name="1.1"></a>
### 1.1 Prepare cross-compilation environment
The cross-compilation environment is used to compile the C++ demos of Paddle-Lite and PaddleClas.
For the detailed compilation directions of different development environments, please refer to the corresponding [document](https://paddle-lite.readthedocs.io/zh/latest/source_compile/compile_env.html).
<a name="1.2"></a>
## 1.2 Download inference library for Android or iOS
|Platform|Inference Library Download Link|
|-|-|
|Android|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.8-rc/Android/gcc/inference_lite_lib.android.armv7.gcc.c++_static.with_extra.with_cv.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.8-rc/Android/gcc/inference_lite_lib.android.armv8.gcc.c++_static.with_extra.with_cv.tar.gz)|
|iOS|[arm7](https://paddlelite-data.bj.bcebos.com/Release/2.8-rc/iOS/inference_lite_lib.ios.armv7.with_cv.with_extra.tiny_publish.tar.gz) / [arm8](https://paddlelite-data.bj.bcebos.com/Release/2.8-rc/iOS/inference_lite_lib.ios.armv8.with_cv.with_extra.tiny_publish.tar.gz)|
**NOTE**:
1. If you download the inference library from [Paddle-Lite official document](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html#android-toolchain-gcc), please choose `with_extra=ON` , `with_cv=ON` .
2. It is recommended to build inference library using [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) develop branch if you want to deploy the [quantitative](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/quantization/README_en.md) model to mobile phones. Please refer to the [link](https://paddle-lite.readthedocs.io/zh/latest/user_guides/Compile/Android.html#id2) for more detailed information about compiling.
The structure of the inference library is as follows:
```
inference_lite_lib.android.armv8/
|-- cxx C++ inference library and header files
| |-- include C++ header files
| | |-- paddle_api.h
| | |-- paddle_image_preprocess.h
| | |-- paddle_lite_factory_helper.h
| | |-- paddle_place.h
| | |-- paddle_use_kernels.h
| | |-- paddle_use_ops.h
| | `-- paddle_use_passes.h
| `-- lib C++ inference library
| |-- libpaddle_api_light_bundled.a C++ static library
| `-- libpaddle_light_api_shared.so C++ dynamic library
|-- java Java inference library
| |-- jar
| | `-- PaddlePredictor.jar
| |-- so
| | `-- libpaddle_lite_jni.so
| `-- src
|-- demo C++ and java demos
| |-- cxx C++ demos
| `-- java Java demos
```
<a name="2"></a>
## 2. Start running
<a name="2.1"></a>
## 2.1 Inference Model Optimization
Paddle-Lite provides a variety of strategies to automatically optimize the original training model, including quantization, sub-graph fusion, hybrid scheduling, Kernel optimization and so on. In order to make the optimization process more convenient and easy to use, Paddle-Lite provides `opt` tool to automatically complete the optimization steps and output a lightweight, optimal executable model.
**NOTE**: If you have already got the `.nb` file, you can skip this step.
<a name="2.1.1"></a>
### 2.1.1 [RECOMMEND] Use `pip` to install Paddle-Lite and optimize model
* Use pip to install Paddle-Lite. The following command uses `pip3.7` .
```shell
pip install paddlelite==2.8
```
**Note**The version of `paddlelite`'s wheel must match that of inference lib.
* Use `paddle_lite_opt` to optimize inference model, the parameters of `paddle_lite_opt` are as follows:
| Parameters | Explanation |
| ----------------------- | ------------------------------------------------------------ |
| --model_dir | Path to the PaddlePaddle model (no-combined) file to be optimized. |
| --model_file | Path to the net structure file of PaddlePaddle model (combined) to be optimized. |
| --param_file | Path to the net weight files of PaddlePaddle model (combined) to be optimized. |
| --optimize_out_type | Type of output model, `protobuf` by default. Supports `protobuf` and `naive_buffer` . Compared with `protobuf`, you can use`naive_buffer` to get a more lightweight serialization/deserialization model. If you need to predict on the mobile-side, please set it to `naive_buffer`. |
| --optimize_out | Path to output model, not needed to add `.nb` suffix. |
| --valid_targets | The executable backend of the model, `arm` by default. Supports one or some of `x86` , `arm` , `opencl` , `npu` , `xpu`. If set more than one, please separate the options by space, and the `opt` tool will choose the best way automatically. If need to support Huawei NPU (DaVinci core carried by Kirin 810/990 SoC), please set it to `npu arm` . |
| --record_tailoring_info | Whether to enable `Cut the Library Files According To the Model` , `false` by default. If need to record kernel and OP infos of optimized model, please set it to `true`. |
In addition, you can run `paddle_lite_opt` to get more detailed information about how to use.
<a name="2.1.2"></a>
### 2.1.2 Compile Paddle-Lite to generate `opt` tool
Optimizing model requires Paddle-Lite's `opt` executable file, which can be obtained by compiling the Paddle-Lite. The steps are as follows:
```shell
# get the Paddle-Lite source code, if have gotten , please skip
git clone https://github.com/PaddlePaddle/Paddle-Lite.git
cd Paddle-Lite
git checkout develop
# compile
./lite/tools/build.sh build_optimize_tool
```
After the compilation is complete, the `opt` file is located under `build.opt/lite/api/`.
`opt` tool is used in the same way as `paddle_lite_opt` , please refer to [2.1.1](#2.1.1).
<a name="2.1.3"></a>
### 2.1.3 Demo of get the optimized model
Taking the `MobileNetV3_large_x1_0` model of PaddleClas as an example, we will introduce how to use `paddle_lite_opt` to complete the conversion from the pre-trained model to the inference model, and then to the Paddle-Lite optimized model.
```shell
# enter PaddleClas root directory
cd PaddleClas_root_path
# download and uncompress the inference model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/MobileNetV3_large_x1_0_infer.tar
tar -xf MobileNetV3_large_x1_0_infer.tar
# convert inference model to Paddle-Lite optimized model
paddle_lite_opt --model_file=./MobileNetV3_large_x1_0_infer/inference.pdmodel --param_file=./MobileNetV3_large_x1_0_infer/inference.pdiparams --optimize_out=./MobileNetV3_large_x1_0
```
When the above code command is completed, there will be ``MobileNetV3_large_x1_0.nb` in the current directory, which is the converted model file.
<a name="2.2"></a>
## 2.2 Run optimized model on Phone
1. Prepare an Android phone with `arm8`. If the compiled inference library and `opt` file are `armv7`, you need an `arm7` phone and modify `ARM_ABI = arm7` in the Makefile.
2. Install the ADB tool on the computer.
* Install ADB for MAC
Recommend use homebrew to install.
```shell
brew cask install android-platform-tools
```
* Install ADB for Linux
```shell
sudo apt update
sudo apt install -y wget adb
```
* Install ADB for windows
If install ADB fo Windows, you need to download from Google's Android platform: [Download Link](https://developer.android.com/studio).
First, make sure the phone is connected to the computer, turn on the `USB debugging` option of the phone, and select the `file transfer` mode. Verify whether ADB is installed successfully as follows:
```shell
$ adb devices
List of devices attached
744be294 device
```
If there is `device` output like the above, it means the installation was successful.
4. Prepare optimized model, inference library files, test image and dictionary file used.
```shell
cd PaddleClas_root_path
cd deploy/lite/
# prepare.sh will put the inference library files, the test image and the dictionary files in demo/cxx/clas
sh prepare.sh /{lite inference library path}/inference_lite_lib.android.armv8
# enter the working directory of lite demo
cd /{lite inference library path}/inference_lite_lib.android.armv8/
cd demo/cxx/clas/
# copy the C++ inference dynamic library file ie. .so) to the debug folder
cp ../../../cxx/lib/libpaddle_light_api_shared.so ./debug/
```
The `prepare.sh` take `PaddleClas/deploy/lite/imgs/tabby_cat.jpg` as the test image, and copy it to the `demo/cxx/clas/debug/` directory.
You should put the model that optimized by `paddle_lite_opt` under the `demo/cxx/clas/debug/` directory. In this example, use `MobileNetV3_large_x1_0.nb` model file generated in [2.1.3](#2.1.3).
The structure of the clas demo is as follows after the above command is completed:
```
demo/cxx/clas/
|-- debug/
| |--MobileNetV3_large_x1_0.nb class model
| |--tabby_cat.jpg test image
| |--imagenet1k_label_list.txt dictionary file
| |--libpaddle_light_api_shared.so C++ .so file
| |--config.txt config file
|-- config.txt config file
|-- image_classfication.cpp source code
|-- Makefile compile file
```
**NOTE**:
* `Imagenet1k_label_list.txt` is the category mapping file of the `ImageNet1k` dataset. If use a custom category, you need to replace the category mapping file.
* `config.txt` contains the hyperparameters, as follows:
```shell
clas_model_file ./MobileNetV3_large_x1_0.nb # path of model file
label_path ./imagenet1k_label_list.txt # path of category mapping file
resize_short_size 256 # the short side length after resize
crop_size 224 # side length used for inference after cropping
visualize 0 # whether to visualize. If you set it to 1, an image file named 'clas_result.png' will be generated in the current directory.
```
5. Run Model on Phone
```shell
# run compile to get the executable file 'clas_system'
make -j
# move the compiled executable file to the debug folder
mv clas_system ./debug/
# push the debug folder to Phone
adb push debug /data/local/tmp/
adb shell
cd /data/local/tmp/debug
export LD_LIBRARY_PATH=/data/local/tmp/debug:$LD_LIBRARY_PATH
# the usage of clas_system is as follows:
# ./clas_system "path of config file" "path of test image"
./clas_system ./config.txt ./tabby_cat.jpg
```
**NOTE**: If you make changes to the code, you need to recompile and repush the `debug ` folder to the phone.
The result is as follows:
![](../../images/inference_deployment/lite_demo_result.png)
<a name="3"></a>
## 3. FAQ
Q1If I want to change the model, do I need to go through the all process again?
A1If you have completed the above steps, you only need to replace the `.nb` model file after replacing the model. At the same time, you may need to modify the path of `.nb` file in the config file and change the category mapping file to be compatible the model .
Q2How to change the test picture?
A2Replace the test image under debug folder with the image you want to testand then repush to the Phone again.

@ -0,0 +1,280 @@
# Model Service Deployment
## Catalogue
- [1. Introduction](#1)
- [2. Installation of Serving](#2)
- [3. Service Deployment for Image Classification](#3)
- [3.1 Model Transformation](#3.1)
- [3.2 Service Deployment and Request](#3.2)
- [4. Service Deployment for Image Recognition](#4)
- [4.1 Model Transformation](#4.1)
- [4.2 Service Deployment and Request](#4.2)
- [5. FAQ](#5)
<a name="1"></a>
## 1. Introduction
[Paddle Serving](https://github.com/PaddlePaddle/Serving) is designed to provide easy deployment of on-line prediction services for deep learning developers, it supports one-click deployment of industrial-grade services, highly concurrent and efficient communication between client and server, and multiple programming languages for client development.
This section, exemplified by HTTP deployment of prediction service, describes how to deploy model services in PaddleClas with PaddleServing. Currently, only deployment on Linux platform is supported. Windows platform is not supported.
<a name="2"></a>
## 2. Installation of Serving
It is officially recommended to use docker for the installation and environment deployment of Serving. First, pull the docker and create a Serving-based one.
```
docker pull paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel
nvidia-docker run -p 9292:9292 --name test -dit paddlepaddle/serving:0.7.0-cuda10.2-cudnn7-devel bash
nvidia-docker exec -it test bash
```
Once you are in docker, install the Serving-related python packages.
```
pip3 install paddle-serving-client==0.7.0
pip3 install paddle-serving-server==0.7.0 # CPU
pip3 install paddle-serving-app==0.7.0
pip3 install paddle-serving-server-gpu==0.7.0.post102 #GPU with CUDA10.2 + TensorRT6
# For other GPU environemnt, confirm the environment before choosing which one to execute
pip3 install paddle-serving-server-gpu==0.7.0.post101 # GPU with CUDA10.1 + TensorRT6
pip3 install paddle-serving-server-gpu==0.7.0.post112 # GPU with CUDA11.2 + TensorRT8
```
- Speed up the installation process by replacing the source with `-i https://pypi.tuna.tsinghua.edu.cn/simple`.
- For other environment configuration and installation, please refer to [Install Paddle Serving using docker](https://github.com/PaddlePaddle/Serving/blob/v0.7.0/doc/Install_EN.md)
- To deploy CPU services, please install the CPU version of serving-server with the following command.
```
pip install paddle-serving-server
```
<a name="3"></a>
## 3. Service Deployment for Image Classification
<a name="3.1"></a>
### 3.1 Model Transformation
When adopting PaddleServing for service deployment, the saved inference model needs to be converted to a Serving model. The following part takes the classic ResNet50_vd model as an example to introduce the deployment of image classification service.
- Enter the working directory:
```
cd deploy/paddleserving
```
- Download the inference model of ResNet50_vd
```
# Download and decompress the ResNet50_vd model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/inference/ResNet50_vd_infer.tar && tar xf ResNet50_vd_infer.tar
```
- Convert the downloaded inference model into a format that is readily deployable by Server with the help of paddle_serving_client.
```
# Convert the ResNet50_vd model
python3 -m paddle_serving_client.convert --dirname ./ResNet50_vd_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./ResNet50_vd_serving/ \
--serving_client ./ResNet50_vd_client/
```
After the transformation, `ResNet50_vd_serving` and `ResNet50_vd_client` will be added to the current folder in the following format:
```
|- ResNet50_vd_server/
|- __model__
|- __params__
|- serving_server_conf.prototxt
|- serving_server_conf.stream.prototxt
|- ResNet50_vd_client
|- serving_client_conf.prototxt
|- serving_client_conf.stream.prototxt
```
Having obtained the model file, modify the alias name in `serving_server_conf.prototxt` under directory `ResNet50_vd_server` by changing `alias_name` in `fetch_var` to `prediction`.
**Notes**: Serving supports input and output renaming to ensure its compatibility with the deployment of different models. In this case, modifying the alias_name of the configuration file is the only step needed to complete the inference and deployment of all kinds of models. The modified serving_server_conf.prototxt is shown below:
```
feed_var {
name: "inputs"
alias_name: "inputs"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "prediction"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
```
<a name="3.2"></a>
### 3.2 Service Deployment and Request
Paddleserving's directory contains the code to start the pipeline service and send prediction requests, including:
```
__init__.py
config.yml # Configuration file for starting the service
pipeline_http_client.py # Script for sending pipeline prediction requests by http
pipeline_rpc_client.py # Script for sending pipeline prediction requests by rpc
classification_web_service.py # Script for starting the pipeline server
```
- Start the service
```
# Start the service and the run log is saved in log.txt
python3 classification_web_service.py &>log.txt &
```
Once the service is successfully started, a log will be printed in log.txt similar to the following ![img](../../../deploy/paddleserving/imgs/start_server.png)
- Send request
```
# Send service request
python3 pipeline_http_client.py
```
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example:![img](../../../deploy/paddleserving/imgs/results.png)
<a name="4"></a>
## 4. Service Deployment for Image Recognition
When using PaddleServing for service deployment, the saved inference model needs to be converted to a Serving model. The following part, exemplified by the ultra-lightweight model for image recognition in PP-ShiTu, details the deployment of image recognition service.
<a name="4.1"></a>
## 4.1 Model Transformation
- Download inference models for general detection and general recognition
```
cd deploy
# Download and decompress general recogntion models
wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar
cd models
tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar
# Download and decompress general detection models
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar
```
- Convert the inference model for recognition into a Serving model:
```
# Convert the recognition model
python3 -m paddle_serving_client.convert --dirname ./general_PPLCNet_x2_5_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./general_PPLCNet_x2_5_lite_v1.0_serving/ \
--serving_client ./general_PPLCNet_x2_5_lite_v1.0_client/
```
After the transformation, `general_PPLCNet_x2_5_lite_v1.0_serving/` and `general_PPLCNet_x2_5_lite_v1.0_serving/` will be added to the current folder. Modify the alias name in serving_server_conf.prototxt under the directory `general_PPLCNet_x2_5_lite_v1.0_serving/` by changing `alias_name` to `features` in `fetch_var`. The modified serving_server_conf.prototxt is similar to the following:
```
feed_var {
name: "x"
alias_name: "x"
is_lod_tensor: false
feed_type: 1
shape: 3
shape: 224
shape: 224
}
fetch_var {
name: "save_infer_model/scale_0.tmp_1"
alias_name: "features"
is_lod_tensor: true
fetch_type: 1
shape: -1
}
```
- Convert the inference model for detection into a Serving model:
```
# Convert the general detection model
python3 -m paddle_serving_client.convert --dirname ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer/ \
--model_filename inference.pdmodel \
--params_filename inference.pdiparams \
--serving_server ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/ \
--serving_client ./picodet_PPLCNet_x2_5_mainbody_lite_v1.0_client/
```
After the transformation, `picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` and `picodet_PPLCNet_x2_5_ mainbody_lite_v1.0_client/` will be added to the current folder.
**Note:** The alias name in the serving_server_conf.prototxt under the directory`picodet_PPLCNet_x2_5_mainbody_lite_v1.0_serving/` requires no modification.
- Download and decompress the constructed search library index
```
cd ../
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar
```
<a name="4.2"></a>
## 4.2 Service Deployment and Request
**Note:** Since the recognition service involves multiple models, PipeLine is adopted for better performance. This deployment method does not support the windows platform for now.
- Enter the working directory
```
cd ./deploy/paddleserving/recognition
```
Paddleserving's directory contains the code to start the pipeline service and send prediction requests, including:
```
__init__.py
config.yml # Configuration file for starting the service
pipeline_http_client.py # Script for sending pipeline prediction requests by http
pipeline_rpc_client.py # Script for sending pipeline prediction requests by rpc
recognition_web_service.py # Script for starting the pipeline server
```
- Start the service
```
# Start the service and the run log is saved in log.txt
python3 recognition_web_service.py &>log.txt &
```
Once the service is successfully started, a log will be printed in log.txt similar to the following ![img](../../../deploy/paddleserving/imgs/start_server_shitu.png)
- Send request
```
python3 pipeline_http_client.py
```
Once the service is successfully started, the prediction results will be printed in the cmd window, see the following example: ![img](../../../deploy/paddleserving/imgs/results_shitu.png)
<a name="5"></a>
## 5.FAQ
**Q1** After sending a request, no result is returned or the output is prompted with a decoding error.
**A1** Please turn off the proxy before starting the service and sending requests, try the following command:
```
unset https_proxy
unset http_proxy
```
For more types of service deployment, such as `RPC prediction services`, you can refer to the [github official website](https://github.com/PaddlePaddle/Serving/tree/v0.7.0/examples) of Serving.

@ -0,0 +1,142 @@
# Infering based on Python prediction engine
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.
Please refer to the document [install paddle](../installation/install_paddle_en.md) and [install paddleclas](../installation/install_paddleclas_en.md) to prepare the environment.
---
## Catalogue
- [1. Image classification inference](#1)
- [2. Mainbody detection model inference](#2)
- [3. Feature Extraction model inference](#3)
- [4. Concatenation of mainbody detection, feature extraction and vector search](#4)
<a name="1"></a>
## 1. Image classification inference
First, please refer to the document [export model](./export_model_en.md) to prepare the inference model files. All the command should be run under `deploy` folder of PaddleClas:
```shell
cd deploy
```
For classification model inference, you can execute the following commands:
```shell
python python/predict_cls.py -c configs/inference_cls.yaml
```
In the configuration file `configs/inference_cls.yaml`, the following fields are used to configure prediction parameters:
* `Global.infer_imgs`: The path of image to be predicted;
* `Global.inference_model_dir`: The directory of inference model files. There should be contain the model files (`inference.pdmodel` and `inference.pdiparams`);
* `Global.use_tensorrt`: Whether use `TensorRT`, `False` by default;
* `Global.use_gpu`: Whether use GPU, `True` by default;
* `Global.enable_mkldnn`: Whether use `MKL-DNN`, `False` by default. Valid only when `use_gpu` is `False`;
* `Global.use_fp16`: Whether use `FP16`, `False` by default;
* `PreProcess`: To config the preprocessing of image to be predicted;
* `PostProcess`: To config the postprocessing of prediction results;
* `PostProcess.Topk.class_id_map_file`: The path of file mapping label and class id. By default ImageNet1k (`./utils/imagenet1k_label_list.txt`).
**Notice**:
* If VisionTransformer series models used, such as `DeiT_***_384`, `ViT_***_384`, please notice the size of model input. And you could need to specify the `PreProcess.resize_short=384`, `PreProcess.resize=384`.
* If you want to improve the speed of the evaluation, it is recommended to enable TensorRT when using GPU, and MKL-DNN when using CPU.
```shell
python python/predict_cls.py -c configs/inference_cls.yaml -o Global.infer_imgs=images/ILSVRC2012_val_00010010.jpeg
```
If you want to use the CPU for prediction, you can switch value of `use_gpu` in config file to `False`. Or you can execute the command as follows
```
python python/predict_cls.py -c configs/inference_cls.yaml -o Global.use_gpu=False
```
<a name="2"></a>
## 2. Mainbody detection model inference
The following will introduce the mainbody detection model inference. All the command should be run under `deploy` folder of PaddleClas:
```shell
cd deploy
```
For mainbody detection model inference, you can execute the following commands:
```shell
mkdir -p models
cd models
# download mainbody detection inference model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
cd ..
# predict
python python/predict_det.py -c configs/inference_det.yaml
```
The input example image is as follows:
[](../images/recognition/product_demo/wangzai.jpg)
The output will be:
```text
[{'class_id': 0, 'score': 0.4762245, 'bbox': array([305.55115, 226.05322, 776.61084, 930.42395], dtype=float32), 'label_name': 'foreground'}]
```
And the visualise result is as follows:
[](../images/recognition/product_demo/wangzai_det_result.jpg)
If you want to detect another image, you can change the value of `infer_imgs` in `configs/inference_det.yaml`,
or you can use `-o Global.infer_imgs` argument. For example, if you want to detect `images/anmuxi.jpg`:
```shell
python python/predict_det.py -c configs/inference_det.yaml -o Global.infer_imgs=images/anmuxi.jpg
```
If you want to use the CPU for prediction, you can switch value of `use_gpu` in config file to `False`. Or you can execute the command as follows
```
python python/predict_det.py -c configs/inference_det.yaml -o Global.use_gpu=False
```
<a name="3"></a>
## 3. Feature Extraction model inference
First, please refer to the document [export model](./export_model_en.md) to prepare the inference model files. All the command should be run under `deploy` folder of PaddleClas:
```shell
cd deploy
```
For feature extraction model inference, you can execute the following commands:
```shell
mkdir -p models
cd models
# download feature extraction inference model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
cd ..
# predict
python python/predict_rec.py -c configs/inference_rec.yaml
```
You can get a 512-dim feature printed in the command line.
If you want to extract feature of another image, you can change the value of `infer_imgs` in `configs/inference_rec.yaml`,
or you can use `-o Global.infer_imgs` argument. For example, if you want to try `images/anmuxi.jpg`:
```shell
python python/predict_rec.py -c configs/inference_rec.yaml -o Global.infer_imgs=images/anmuxi.jpg
```
If you want to use the CPU for prediction, you can switch value of `use_gpu` in config file to `False`. Or you can execute the command as follows
```
python python/predict_rec.py -c configs/inference_rec.yaml -o Global.use_gpu=False
```
<a name="4"></a>
## 4. Concatenation of mainbody detection, feature extraction and vector search
Please refer to [Quick Start of Recognition](../quick_start/quick_start_recognition_en.md)

@ -0,0 +1,256 @@
# PaddleClas wheel package
Paddleclas supports Python WHL package for prediction. At present, WHL package only supports image classification, but does not support subject detection, feature extraction and vector retrieval.
---
## Catalogue
- [1. Installation](#1)
- [2. Quick Start](#2)
- [3. Definition of Parameters](#3)
- [4. Usage](#4)
- [4.1 View help information](#4.1)
- [4.2 Prediction using inference model provide by PaddleClas](#4.2)
- [4.3 Prediction using local model files](#4.3)
- [4.4 Prediction by batch](#4.4)
- [4.5 Prediction of Internet image](#4.5)
- [4.6 Prediction of `NumPy.array` format image](#4.6)
- [4.7 Save the prediction result(s)](#4.7)
- [4.8 Specify the mapping between class id and label name](#4.8)
<a name="1"></a>
## 1. Installation
* installing from pypi
```bash
pip3 install paddleclas==2.2.1
```
* build own whl package and install
```bash
python3 setup.py bdist_wheel
pip3 install dist/*
```
<a name="2"></a>
## 2. Quick Start
* Using the `ResNet50` model provided by PaddleClas, the following image(`'docs/images/inference_deployment/whl_demo.jpg'`) as an example.
![](../../images/inference_deployment/whl_demo.jpg)
* Python
```python
from paddleclas import PaddleClas
clas = PaddleClas(model_name='ResNet50')
infer_imgs='docs/images/inference_deployment/whl_demo.jpg'
result=clas.predict(infer_imgs)
print(next(result))
```
**Note**: `PaddleClas.predict()` is a `generator`. Therefore you need to use `next()` or `for` call it iteratively. It will perform a prediction by `batch_size` and return the prediction result(s) when called. Examples of returned results are as follows:
```
>>> result
[{'class_ids': [8, 7, 136, 80, 84], 'scores': [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], 'label_names': ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']}]
```
* CLI
```bash
paddleclas --model_name=ResNet50 --infer_imgs="docs/images/inference_deployment/whl_demo.jpg"
```
```
>>> result
filename: docs/images/inference_deployment/whl_demo.jpg, top-5, class_ids: [8, 7, 136, 80, 84], scores: [0.79368, 0.16329, 0.01853, 0.00959, 0.00239], label_names: ['hen', 'cock', 'European gallinule, Porphyrio porphyrio', 'black grouse', 'peacock']
Predict complete!
```
<a name="3"></a>
## 3. Definition of Parameters
The following parameters can be specified in Command Line or used as parameters of the constructor when instantiating the PaddleClas object in Python.
* model_name(str): If using inference model based on ImageNet1k provided by Paddle, please specify the model's name by the parameter.
* inference_model_dir(str): Local model files directory, which is valid when `model_name` is not specified. The directory should contain `inference.pdmodel` and `inference.pdiparams`.
* infer_imgs(str): The path of image to be predicted, or the directory containing the image files, or the URL of the image from Internet.
* use_gpu(bool): Whether to use GPU or not, default by `True`.
* gpu_mem(int): GPU memory usagesdefault by `8000`
* use_tensorrt(bool): Whether to open TensorRT or not. Using it can greatly promote predict preformance, default by `False`.
* enable_mkldnn(bool): Whether enable MKLDNN or not, default `False`.
* cpu_num_threads(int): Assign number of cpu threads, valid when `--use_gpu` is `False` and `--enable_mkldnn` is `True`, default by `10`.
* batch_size(int): Batch size, default by `1`.
* resize_short(int): Resize the minima between height and width into `resize_short`, default by `256`.
* crop_size(int): Center crop image to `crop_size`, default by `224`.
* topk(int): Print (return) the `topk` prediction results, default by `5`.
* class_id_map_file(str): The mapping file between class ID and label, default by `ImageNet1K` dataset's mapping.
* pre_label_image(bool): whether prelabel or not, default=False.
* save_dir(str): The directory to save the prediction results that can be used as pre-label, default by `None`, that is, not to save.
**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`. The following is a demo.
* CLI:
```bash
from paddleclas import PaddleClas, get_default_confg
paddleclas --model_name=ViT_base_patch16_384 --infer_imgs='docs/images/inference_deployment/whl_demo.jpg' --resize_short=384 --crop_size=384
```
* Python:
```python
from paddleclas import PaddleClas
clas = PaddleClas(model_name='ViT_base_patch16_384', resize_short=384, crop_size=384)
```
<a name="4"></a>
## 4. Usage
PaddleClas provides two ways to use:
1. Python interative programming;
2. Bash command line programming.
<a name="4.1"></a>
### 4.1 View help information
* CLI
```bash
paddleclas -h
```
<a name="4.2"></a>
### 4.2 Prediction using inference model provide by PaddleClas
You can use the inference model provided by PaddleClas to predict, and only need to specify `model_name`. In this case, PaddleClas will automatically download files of specified model and save them in the directory `~/.paddleclas/`.
* Python
```python
from paddleclas import PaddleClas
clas = PaddleClas(model_name='ResNet50')
infer_imgs = 'docs/images/inference_deployment/whl_demo.jpg'
result=clas.predict(infer_imgs)
print(next(result))
```
* CLI
```bash
paddleclas --model_name='ResNet50' --infer_imgs='docs/images/inference_deployment/whl_demo.jpg'
```
<a name="4.3"></a>
### 4.3 Prediction using local model files
You can use the local model files trained by yourself to predict, and only need to specify `inference_model_dir`. Note that the directory must contain `inference.pdmodel` and `inference.pdiparams`.
* Python
```python
from paddleclas import PaddleClas
clas = PaddleClas(inference_model_dir='./inference/')
infer_imgs = 'docs/images/inference_deployment/whl_demo.jpg'
result=clas.predict(infer_imgs)
print(next(result))
```
* CLI
```bash
paddleclas --inference_model_dir='./inference/' --infer_imgs='docs/images/inference_deployment/whl_demo.jpg'
```
<a name="4.4"></a>
### 4.4 Prediction by batch
You can predict by batch, only need to specify `batch_size` when `infer_imgs` is direcotry contain image files.
* Python
```python
from paddleclas import PaddleClas
clas = PaddleClas(model_name='ResNet50', batch_size=2)
infer_imgs = 'docs/images/'
result=clas.predict(infer_imgs)
for r in result:
print(r)
```
* CLI
```bash
paddleclas --model_name='ResNet50' --infer_imgs='docs/images/' --batch_size 2
```
<a name="4.5"></a>
### 4.5 Prediction of Internet image
You can predict the Internet image, only need to specify URL of Internet image by `infer_imgs`. In this case, the image file will be downloaded and saved in the directory `~/.paddleclas/images/`.
* Python
```python
from paddleclas import PaddleClas
clas = PaddleClas(model_name='ResNet50')
infer_imgs = 'https://raw.githubusercontent.com/paddlepaddle/paddleclas/release/2.2/docs/images/inference_deployment/whl_demo.jpg'
result=clas.predict(infer_imgs)
print(next(result))
```
* CLI
```bash
paddleclas --model_name='ResNet50' --infer_imgs='https://raw.githubusercontent.com/paddlepaddle/paddleclas/release/2.2/docs/images/inference_deployment/whl_demo.jpg'
```
<a name="4.6"></a>
### 4.6 Prediction of NumPy.array format image
In Python code, you can predict the `NumPy.array` format image, only need to use the `infer_imgs` to transfer variable of image data. Note that the models in PaddleClas only support to predict 3 channels image data, and channels order is `RGB`.
* python
```python
import cv2
from paddleclas import PaddleClas
clas = PaddleClas(model_name='ResNet50')
infer_imgs = cv2.imread("docs/en/inference_deployment/whl_deploy_en.md")[:, :, ::-1]
result=clas.predict(infer_imgs)
print(next(result))
```
<a name="4.7"></a>
### 4.7 Save the prediction result(s)
You can save the prediction result(s) as pre-label, only need to use `pre_label_out_dir` to specify the directory to save.
* python
```python
from paddleclas import PaddleClas
clas = PaddleClas(model_name='ResNet50', save_dir='./output_pre_label/')
infer_imgs = 'docs/images/inference_deployment/whl_' # it can be infer_imgs folder path which contains all of images you want to predict.
result=clas.predict(infer_imgs)
print(next(result))
```
* CLI
```bash
paddleclas --model_name='ResNet50' --infer_imgs='docs/images/inference_deployment/whl_' --save_dir='./output_pre_label/'
```
<a name="4.8"></a>
### 4.8 Specify the mapping between class id and label name
You can specify the mapping between class id and label name, only need to use `class_id_map_file` to specify the mapping file. PaddleClas uses ImageNet1K's mapping by default.
The content format of mapping file shall be:
```
class_id<space>class_name<\n>
```
For example:
```
0 tench, Tinca tinca
1 goldfish, Carassius auratus
2 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias
......
```
* Python
```python
from paddleclas import PaddleClas
clas = PaddleClas(model_name='ResNet50', class_id_map_file='./ppcls/utils/imagenet1k_label_list.txt')
infer_imgs = 'docs/images/inference_deployment/whl_demo.jpg'
result=clas.predict(infer_imgs)
print(next(result))
```
* CLI
```bash
paddleclas --model_name='ResNet50' --infer_imgs='docs/images/inference_deployment/whl_demo.jpg' --class_id_map_file='./ppcls/utils/imagenet1k_label_list.txt'
```

@ -0,0 +1,8 @@
installation
================================
.. toctree::
:maxdepth: 2
install_paddle_en.md
install_paddleclas_en.md

@ -0,0 +1,100 @@
# Install PaddlePaddle
---
## Catalogue
- [1. Environment requirements](#1)
- [2.(Recommended) Prepare a docker environment](#2)
- [3. Install PaddlePaddle using pip](#3)
- [4. Verify installation](#4)
At present, **PaddleClas** requires **PaddlePaddle** version `>=2.0`. Docker is recomended to run Paddleclas, for more detailed information about docker and nvidia-docker, you can refer to the [tutorial](https://docs.docker.com/get-started/). If you do not want to use docker, you can skip section [2. (Recommended) Prepare a docker environment](#2), and go into section [3. Install PaddlePaddle using pip](#3).
<a name="1"></a>
## 1. Environment requirements
- python 3.x
- cuda >= 10.1 (necessary if paddlepaddle-gpu is used)
- cudnn >= 7.6.4 (necessary if paddlepaddle-gpu is used)
- nccl >= 2.1.2 (necessary distributed training/eval is used)
- gcc >= 8.2
**Recomends**:
* When CUDA version is 10.1, the driver version `>= 418.39`;
* When CUDA version is 10.2, the driver version `>= 440.33`;
* For more CUDA versions and specific driver versions, please refer to [link](https://docs.nvidia.com/deploy/cuda-compatibility/index.html).
<a name="2"></a>
## 2. (Recommended) Prepare a docker environment
* Switch to the working directory
```shell
cd /home/Projects
```
* Create docker container
The following commands will create a docker container named ppcls and map the current working directory to the `/paddle' directory in the container.
```shell
# For GPU users
sudo nvidia-docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0-gpu-cuda10.2-cudnn7 /bin/bash
# For CPU users
sudo docker run --name ppcls -v $PWD:/paddle --shm-size=8G --network=host -it paddlepaddle/paddle:2.1.0 /bin/bash
```
**Notices**:
* The first time you use this docker image, it will be downloaded automatically. Please be patient;
* The above command will create a docker container named ppcls, and there is no need to run the command again when using the container again;
* The parameter `--shm-size=8g` will set the shared memory of the container to 8g. If conditions permit, it is recommended to set this parameter to a larger value, such as `64g`;
* You can also access [DockerHub](https://hub.Docker.com/r/paddlepaddle/paddle/tags/) to obtain the image adapted to your machine;
* Exit / Enter the docker container:
* After entering the docker container, you can exit the current container by pressing `Ctrl + P + Q` without closing the container;
* To re-enter the container, use the following command:
```shell
sudo Docker exec -it ppcls /bin/bash
```
<a name="3"></a>
## 3. Install PaddlePaddle using pip
If you want to use PaddlePaddle on GPU, you can use the following command to install PaddlePaddle.
```bash
pip install paddlepaddle-gpu --upgrade -i https://mirror.baidu.com/pypi/simple
```
If you want to use PaddlePaddle on CPU, you can use the following command to install PaddlePaddle.
```bash
pip install paddlepaddle --upgrade -i https://mirror.baidu.com/pypi/simple
```
**Note:**
* If you have already installed CPU version of PaddlePaddle and want to use GPU version now, you should uninstall CPU version of PaddlePaddle and then install GPU version to avoid package confusion.
* You can also compile PaddlePaddle from source code, please refer to [PaddlePaddle Installation tutorial](http://www.paddlepaddle.org.cn/install/quick) to more compilation options.
<a name="4"></a>
## 4. Verify Installation
```python
import paddle
paddle.utils.run_check()
```
Check PaddlePaddle version
```bash
python -c "import paddle; print(paddle.__version__)"
```
Note:
* Make sure the compiled source code is later than PaddlePaddle2.0.
* Indicate `WITH_DISTRIBUTE=ON` when compiling, Please refer to [Instruction](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#id3) for more details.
* When running in docker, in order to ensure that the container has enough shared memory for dataloader acceleration of Paddle, please set the parameter `--shm-size=8g` at creating a docker container, if conditions permit, you can set it to a larger value.

@ -0,0 +1,32 @@
# Install PaddleClas
---
## Catalogue
* [1. Clone PaddleClas source code](#1)
* [2. Install requirements](#2)
<a name='1'></a>
### 1. Clone PaddleClas source code
```shell
git clone https://github.com/PaddlePaddle/PaddleClas.git -b develop
```
If it is too slow for you to download from github, you can download PaddleClas from gitee. The command is as follows.
```shell
git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop
```
<a name='2'></a>
## 2. Install requirements
PaddleClas dependencies are listed in file `requirements.txt`, you can use the following command to install the dependencies.
```
pip install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple
```

@ -0,0 +1,13 @@
## Features of PaddleClas
PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios. Specifically, it contains the following core features.
- Practical image recognition system: Integrate detection, feature learning, and retrieval modules to be applicable to all types of image recognition tasks. Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition, and animation character recognition.
- Rich library of pre-trained models: Provide a total of 175 ImageNet pre-trained models of 36 series, among which 7 selected series of models support fast structural modification.
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with the detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment.
![](../../images/recognition.gif)
For more information about the quick start of image recognition, algorithm details, model training and evaluation, and prediction and deployment methods, please refer to the [README Tutorial](../../../README_ch.md) on home page.

@ -0,0 +1,8 @@
introduction
================================
.. toctree::
:maxdepth: 2
function_intro_en.md
more_demo/index

@ -0,0 +1,53 @@
# Cartoon Demo Images
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069080-a821e0b7-8a10-4946-bf05-ff093cc16064.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069100-7539d292-1bd8-4655-8a6d-d1f2238bd618.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069103-f91359d4-1197-4a6e-b2f7-434c76a6b704.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069108-ad54ae1d-610d-4cfa-9cd6-8ee8d280d61d.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069114-3c771434-84a8-4e58-961e-d35edfbfe5ef.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069119-e8d85be5-da87-4125-ae8b-9fd4cac139d9.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069124-98c30894-4837-4f2f-8399-3d3ebadfd0a1.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069125-a9edf115-33a1-48bf-9e4f-7edbc4269a1e.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069129-98553a25-00e2-4f0f-9b44-dfc4e4f6b6d1.png " width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069131-f7649bb2-255c-4725-a635-799b8b4d815a.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069135-acb69b89-55db-41ac-9846-e2536ef3d955.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069137-1f0abfdb-6608-432e-bd40-c8e1ab86ef8b.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069140-18c6a439-f117-498d-9cdb-ade71cc2c248.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069145-80452f86-afcf-42b5-8423-328cca9e4750.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069154-63a25c1c-b448-44c2-8baf-eb31952c5476.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069156-1b881c6b-5680-4f9a-aef1-2491af50675d.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069161-8759f3d4-8456-43ea-bf54-99a646d5a109.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069167-937aa847-c661-431c-b3dc-5a3c890b31cd.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069170-43d0dce4-6c62-485d-adf4-364c8467c251.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069175-70bc9e50-b833-4a2a-8a3f-c0775dac49c2.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069179-d01f8a0f-4383-4b08-b064-4e6bb006e745.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069184-d423a84c-c9dd-4125-9dc7-397cae21efc9.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069188-fc4deb80-38a2-4c50-9a29-30cee4c8e374.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069193-77a19ee8-b1e2-4c27-9016-3440a1547470.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069196-5f050524-ac08-4831-89f5-9e9e3ce085c1.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069200-4f963171-c790-4f43-8ca3-2e701ad3731c.jpeg" width = "400" /> </div>

@ -0,0 +1,11 @@
more_demo
================================
.. toctree::
:maxdepth: 1
product.md
logo.md
cartoon.md
more_demo.md
vehicle.md

@ -0,0 +1,65 @@
# Logo Demo Images
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096687-5b562e2d-0653-4be6-861d-1936a4440df2.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096698-4b95eb4b-6638-47dc-ae48-7b40744a31ba.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096701-4a4b2bd9-85f2-4d55-be4b-be6ab5e0fb81.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096706-ef4ad024-7284-4cb3-975a-779fd06b96f5.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096710-620b0495-cc83-4501-a104-dfe20afb53d2.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096713-48e629aa-c637-4603-b005-18570fa94d6d.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096715-709957f2-50bb-4edb-a6e4-e7d5601872c7.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096717-a74744cc-4fb8-4e78-b1cb-20409582ca52.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096721-d4af003c-7945-4591-9e47-4e428dc2628c.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096726-460af6ab-8595-4fb4-9960-4c66b18bee1e.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096728-81494000-92b5-40ad-a6a7-606dae3548a3.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096731-2e980977-9ee6-4e29-bdf7-8397820f70e8.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096732-7d425b45-6b04-4984-948d-278da13dd802.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096735-a9c85c14-5965-4529-a235-ce00035bd7ab.jpg " width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096736-3182efc6-ba43-4cde-9397-88a131f4fed8.jpg " width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096737-91e6fa24-1eb5-4aba-9271-5a3722cbe35b.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096740-f440f89b-5f95-493a-b087-00c7cd3481ef.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096747-31b924e3-ffb2-45ab-872e-4ff923ed04f1.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096752-1f98c937-5d83-4c29-b495-01971b5fb258.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096756-a994c7e2-b9e7-40ba-9934-78c10666217b.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096757-879749e0-9e04-4d1e-a07b-6a4322975a84.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096761-5b682ce8-4f83-4fbb-bfb7-df749912aa8b.png " width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096767-e8f701eb-d0e8-4304-b031-e2bff8c199f3.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096778-ec2ad374-b9fc-427e-9e8b-8e5d2afc6394.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096783-9ec5e04d-19e3-463d-ad9d-7a26202bbb9c.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096788-44f04979-18ca-4ba6-b833-7489b344ffff.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096791-6989451e-157c-4101-8b54-7578b05eb7c9.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096796-cc4477cf-016c-4b19-86c3-61824704ecf5.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096798-ba33ee0d-45b8-48ad-a8fa-14cd643a6976.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096805-e29a2ba8-4785-4ca6-9e0d-596fad6ce8dc.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096812-7d8c57a5-fbae-4496-8144-3b40ac74fef0.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096816-50f0ac3d-f2eb-4011-a34e-58e2e215b7b0.jpg " width = "400" /> </div>

@ -0,0 +1,34 @@
## Demo images
- Product recognition
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277277-7b29f596-35f6-4f00-8d2b-0ef0be57a090.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277291-f7d2b2a1-5790-4f5b-a0e6-f5c52d04a69a.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277300-8ce0d5ce-e0ca-46ea-bb9a-74df0df66ae3.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277308-14a097bd-2bcd-41ce-a9e6-5e9cd0bd8b08.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277311-208ae574-a708-46e2-a41e-c639322913b1.jpg" width = "400" /> </div>
[More demo images](product.md)
- Cartoon character recognition
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069108-ad54ae1d-610d-4cfa-9cd6-8ee8d280d61d.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069100-7539d292-1bd8-4655-8a6d-d1f2238bd618.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069080-a821e0b7-8a10-4946-bf05-ff093cc16064.jpeg" width = "400" /> </div>
[More demo images](cartoon.md)
- Logo recognition
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096687-5b562e2d-0653-4be6-861d-1936a4440df2.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096701-4a4b2bd9-85f2-4d55-be4b-be6ab5e0fb81.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096706-ef4ad024-7284-4cb3-975a-779fd06b96f5.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096713-48e629aa-c637-4603-b005-18570fa94d6d.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096752-1f98c937-5d83-4c29-b495-01971b5fb258.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096767-e8f701eb-d0e8-4304-b031-e2bff8c199f3.jpeg" width = "400" /> </div>
[More demo images](logo.md)
- Car recognition
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243899-c60f0a51-db9b-438a-9f2d-0d2893c200bb.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243905-7eeb938d-d88f-4540-a667-06e08dcf1f55.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243911-735a6ec0-a624-4965-b3cd-2b9f52fa8d65.jpeg" width = "400" /> </div>
[More demo images](vehicle.md)

@ -0,0 +1,179 @@
# Product Demo Images
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277277-7b29f596-35f6-4f00-8d2b-0ef0be57a090.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277287-7bdad02a-8e3c-4e04-861c-95a5dae1f3c6.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277291-f7d2b2a1-5790-4f5b-a0e6-f5c52d04a69a.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277294-80aaab94-5109-41be-97f8-3ada73118963.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277296-2a8d7846-cd2e-454e-8b72-46233da09451.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277300-8ce0d5ce-e0ca-46ea-bb9a-74df0df66ae3.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277302-25c973eb-f9aa-42ce-b9e9-66cee738c241.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277303-3d3460da-c6aa-4994-b585-17bc9f3df504.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277306-20cbef71-cc58-4ae1-965b-4806e82988a9.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277308-14a097bd-2bcd-41ce-a9e6-5e9cd0bd8b08.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277309-be092d1c-6513-472c-8b7f-685f4353ae5b.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277311-208ae574-a708-46e2-a41e-c639322913b1.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277314-72901737-5ef5-4a23-820b-1db58c5e6ca0.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277318-aef4080c-24f2-4d92-be3c-45b500b75584.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277320-8046d0df-1256-41ce-a8d6-6d2c1292462c.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277321-e3864473-6a8e-485f-81f2-562b902d6cff.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277324-0aacc27f-699a-437b-bac0-4a20c90b47b1.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277328-8d28f754-8645-4c05-a9a6-0312bbe2f890.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277333-59da1513-e7e5-455c-ab73-7a3162216923.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277335-454c0423-5398-4348-aaab-e2652fd08999.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277338-a7d09c28-1b86-4cf5-bd79-99d51c5b5311.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277343-9c456d21-8018-4cd5-9c0b-cc7c087fac69.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277345-2ef780f1-d7c9-4cf2-a370-f220a052eb71.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277347-baa4b870-7fca-4d4c-8528-fad720270024.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277351-e0691080-ede4-49ae-9075-d36a41cebf25.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277354-509f8f85-f171-44e9-8ca1-4c3cae77b5fb.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277357-39d572b8-60ee-44db-9e0e-2c0ea2be2ed3.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277359-6caf33f6-2a38-48e5-b349-f4dd1ef2566b.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277362-260daa87-1db7-4f89-ba9c-1b32876fd3b6.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277366-14cfd2f9-d044-4288-843e-463a1816163e.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277368-b0e96341-e030-4e4d-8010-6f7c3bc94d2f.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277370-1f26e4e5-9988-4427-a035-44bfd9d472d6.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277372-27e60b60-cd5c-4b05-ae38-2e9524c627f3.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277374-bd292bb2-e1f9-4d5f-aa49-d67ac571d01b.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277377-b0b8cdb9-8196-4598-ae47-b615914bf6bf.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277382-fc89d18a-a57b-4331-adbb-bda3584fb122.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277386-d676467c-9846-4051-8192-b3e089d01cdc.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277390-83f66d3f-c415-47e6-b651-6b51fbe59bbf.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277394-9895d654-3163-4dd9-882c-ac5a893e2ad3.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277396-9e5e3aa3-6d9e-40ab-a325-2edea452156d.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277399-b92e2092-eabd-45c8-bf36-b2e238167892.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277404-285f943a-de70-48b8-9545-53e229b7350d.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277406-0b7ec434-f064-4985-80f3-c00735b3e32d.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277408-4f9b8b19-42c2-4ba4-bf6d-b95ababe0313.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277409-6df0faf7-71b7-4c9a-a875-36ae7ee7129d.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277411-9c2b364a-749d-465e-a85d-29a69e9ff3ef.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277413-c54a462c-dd3b-4ad0-985d-ef0ec1f216ec.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277414-6d922055-cd59-4f84-b5b6-651209d6336a.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277417-78e1322e-4039-4232-b217-1be4f207f804.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277419-181822a3-bae6-4c4f-9959-59e991c2df6c.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277422-76f09d84-cb47-4332-aa88-a12458cd8993.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277424-a72203b5-1a99-4464-a39c-245f7a891f25.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277429-521ac9a6-e4c3-4c74-9c5b-8e8dd6cddf34.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277433-4f9fb9c8-7930-4303-b54e-a6eace347923.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277434-f3aa3565-a2c5-4c1c-ab44-930a8b073b5f.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277437-90cf1cd7-6a62-4ac4-ac85-3aa534e50cee.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277439-54e168bc-9518-429e-9e97-cb9ca5e811c9.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277441-a3c277d7-c889-4556-b74a-400cadf8b771.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277442-22a0cd38-acd8-4b5a-8e59-c4bea852fb79.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277444-ea662034-c17f-47ba-9ea3-694d3cb0c880.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277448-a71f4a0a-c3cc-4432-a803-843b7c65307f.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277449-0b3a2e98-3e09-4bd6-be32-c35f44154e8a.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277452-e36ccc63-8e39-4973-a336-4ace855d25e6.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277454-bddd9527-b189-4771-ab9e-52085db5a44d.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277455-7ea277ba-bc75-48db-9567-40e1acb56f02.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277460-0f5ee4dc-5ece-45d5-8ef9-666f1be41b76.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277461-37cab773-6341-4c91-b1f4-780d530eab3b.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277465-8f53ef9d-0465-4a90-afac-b1dd3c970b72.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277467-655ddabe-cbe0-4d1f-a30e-c2965428e8d7.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277470-4587e905-3fc8-4dad-84ee-0844ba4d2474.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277473-a155944f-efe3-492a-babc-2f3fe700a99b.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277475-c95ab821-f5ae-427a-8721-8991f9c7f29f.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277479-55b59855-2ed6-4526-9481-6b92b25fef97.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277484-556f0e4c-007b-4f6a-b21f-c485f630cbcb.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277486-a39eb069-bc13-415e-b936-ba294216dfac.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277487-80952841-6a76-4fb3-8049-fe15ce8f7cfb.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277491-e892a6a8-6f9a-46c7-83e0-261cfb92d276.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277494-520f483e-654d-4399-9684-1fcd9778b76e.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277496-54b1ada5-e6a6-4654-a8a6-739511cec750.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277500-ff7e2afd-9cd7-484a-bd1e-362226f5197f.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277501-94489261-bea5-4492-bf3e-98cc8aaa7a7f.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277504-567a32bc-a573-4154-a9cd-6acbec923768.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277506-e893d4d5-43ce-4df1-9f08-3cdf6a8c7e2c.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277509-5766629f-bb92-4552-b34a-647e29b9a89b.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277511-8821708b-09f0-4aab-86dd-40ae3794697a.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277515-ed6a0dff-bd91-4233-a9af-e2744df7c7e0.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277519-1883d6a1-9348-4514-8924-dde27dd38704.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277524-b9d8515c-4df2-410a-b4a6-da098cb9da61.jpg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277526-52a9c666-a799-4921-b371-41d97d7d9242.jpg" width = "400" /> </div>

@ -0,0 +1,33 @@
# Vehicle Demo Images
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243899-c60f0a51-db9b-438a-9f2d-0d2893c200bb.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243904-fdbe2e01-dc7c-449a-8e9e-baea4f85fee4.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243905-7eeb938d-d88f-4540-a667-06e08dcf1f55.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243908-c7f1e3ea-92a7-429b-888c-732b9ec5398f.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243911-735a6ec0-a624-4965-b3cd-2b9f52fa8d65.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243913-baec489a-5463-472b-b5d1-418bcd4eb978.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243916-f50dfcdd-2d5f-48f9-876f-dbc05f4afa30.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243920-7a65ec82-8312-421e-985a-c394f11af28f.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243922-458e6dca-fb80-4baf-951e-9651080dc242.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243926-5df3036b-9ea1-441c-b30a-b4f847df25ab.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243927-7673d94a-fbb0-4a92-a3f3-c879a432a7db.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243928-91082855-c5a7-4a3f-aeea-7a2e51e43183.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243929-88fe7efa-b212-4105-af2f-2248a6cb2877.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243933-49e71d02-8228-40ec-99b2-3ed862bf4ba5.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243935-530fbfa3-0d34-4d9d-bd59-2fde5659f7e5.jpeg" width = "400" /> </div>
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243940-d289fc7d-d343-4aa5-a807-9ce09a241ccd.jpeg" width = "400" /> </div>

@ -0,0 +1,35 @@
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
:end
popd

@ -0,0 +1,27 @@
# DLA series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## Overview
DLA (Deep Layer Aggregation). Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Although skip connections have been incorporated to combine layers, these connections have been "shallow" themselves, and only fuse by simple, one-step operations. The authors augment standard architectures with deeper aggregation to better fuse information across layers. Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. [paper](https://arxiv.org/abs/1707.06484)
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:-----------------:|:----------:|:---------:|:---------:|:---------:|
| DLA34 | 15.8 | 3.1 | 76.03 | 92.98 |
| DLA46_c | 1.3 | 0.5 | 63.21 | 85.30 |
| DLA46x_c | 1.1 | 0.5 | 64.36 | 86.01 |
| DLA60 | 22.0 | 4.2 | 76.10 | 92.92 |
| DLA60x | 17.4 | 3.5 | 77.53 | 93.78 |
| DLA60x_c | 1.3 | 0.6 | 66.45 | 87.54 |
| DLA102 | 33.3 | 7.2 | 78.93 | 94.52 |
| DLA102x | 26.4 | 5.9 | 78.10 | 94.00 |
| DLA102x2 | 41.4 | 9.3 | 78.85 | 94.45 |
| DLA169 | 53.5 | 11.6 | 78.09 | 94.09 |

@ -0,0 +1,78 @@
# DPN and DenseNet series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on V100 GPU](#3)
* [4. Inference speed based on T4 GPU](#4)
<a name='1'></a>
## 1. Overview
DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge. The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPs and parameters.
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.DPN.png)
The pretrained models of these two types of models (a total of 10) are open sourced in PaddleClas at present. The indicators are shown in the figure above. It is easy to observe that under the same FLOPs and parameters, DPN has higher accuracy than DenseNet. However,because DPN has more branches, its inference speed is slower than DenseNet. Since DenseNet264 has the deepest layers in all DenseNet networks, it has the largest parameters,DenseNet161 has the largest width, resulting the largest FLOPs and the highest accuracy in this series. From the perspective of inference speed, DenseNet161, which has a large FLOPs and high accuracy, has a faster speed than DenseNet264, so it has a greater advantage than DenseNet264.
For DPN series networks, the larger the model's FLOPs and parameters, the higher the model's accuracy. Among them, since the width of DPN107 is the largest, it has the largest number of parameters and FLOPs in this series of networks.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| DenseNet121 | 0.757 | 0.926 | 0.750 | | 5.690 | 7.980 |
| DenseNet161 | 0.786 | 0.941 | 0.778 | | 15.490 | 28.680 |
| DenseNet169 | 0.768 | 0.933 | 0.764 | | 6.740 | 14.150 |
| DenseNet201 | 0.776 | 0.937 | 0.775 | | 8.610 | 20.010 |
| DenseNet264 | 0.780 | 0.939 | 0.779 | | 11.540 | 33.370 |
| DPN68 | 0.768 | 0.934 | 0.764 | 0.931 | 4.030 | 10.780 |
| DPN92 | 0.799 | 0.948 | 0.793 | 0.946 | 12.540 | 36.290 |
| DPN98 | 0.806 | 0.951 | 0.799 | 0.949 | 22.220 | 58.460 |
| DPN107 | 0.809 | 0.953 | 0.802 | 0.951 | 35.060 | 82.970 |
| DPN131 | 0.807 | 0.951 | 0.801 | 0.949 | 30.510 | 75.360 |
<a name='3'></a>
## 3. Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-------------|-----------|-------------------|--------------------------|
| DenseNet121 | 224 | 256 | 4.371 |
| DenseNet161 | 224 | 256 | 8.863 |
| DenseNet169 | 224 | 256 | 6.391 |
| DenseNet201 | 224 | 256 | 8.173 |
| DenseNet264 | 224 | 256 | 11.942 |
| DPN68 | 224 | 256 | 11.805 |
| DPN92 | 224 | 256 | 17.840 |
| DPN98 | 224 | 256 | 21.057 |
| DPN107 | 224 | 256 | 28.685 |
| DPN131 | 224 | 256 | 28.083 |
<a name='4'></a>
## 4. Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| DenseNet121 | 224 | 256 | 4.16436 | 7.2126 | 10.50221 | 4.40447 | 9.32623 | 15.25175 |
| DenseNet161 | 224 | 256 | 9.27249 | 14.25326 | 20.19849 | 10.39152 | 22.15555 | 35.78443 |
| DenseNet169 | 224 | 256 | 6.11395 | 10.28747 | 13.68717 | 6.43598 | 12.98832 | 20.41964 |
| DenseNet201 | 224 | 256 | 7.9617 | 13.4171 | 17.41949 | 8.20652 | 17.45838 | 27.06309 |
| DenseNet264 | 224 | 256 | 11.70074 | 19.69375 | 24.79545 | 12.14722 | 26.27707 | 40.01905 |
| DPN68 | 224 | 256 | 11.7827 | 13.12652 | 16.19213 | 11.64915 | 12.82807 | 18.57113 |
| DPN92 | 224 | 256 | 18.56026 | 20.35983 | 29.89544 | 18.15746 | 23.87545 | 38.68821 |
| DPN98 | 224 | 256 | 21.70508 | 24.7755 | 40.93595 | 21.18196 | 33.23925 | 62.77751 |
| DPN107 | 224 | 256 | 27.84462 | 34.83217 | 60.67903 | 27.62046 | 52.65353 | 100.11721 |
| DPN131 | 224 | 256 | 28.58941 | 33.01078 | 55.65146 | 28.33119 | 46.19439 | 89.24904 |

@ -0,0 +1,23 @@
# ESNet Series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
ESNet (Enhanced ShuffleNet) is a lightweight network developed by Baidu. This network combines the advantages of MobileNetV3, GhostNet, and PPLCNet on the basis of ShuffleNetV2 to form a faster and more accurate network on ARM devices, Because of its excellent performance, [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) launched in PaddleDetection uses this model as a backbone, with stronger object detection algorithm, the final mAP index refreshed the SOTA index of the object detection model on the ARM device in one fell swoop.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | FLOPs<br>(M) | Params<br/>(M) |
|:--:|:--:|:--:|:--:|:--:|
| ESNet_x0_25 | 62.48 | 83.46 | 30.9 | 2.83 |
| ESNet_x0_5 | 68.82 | 88.04 | 67.3 | 3.25 |
| ESNet_x0_75 | 72.24 | 90.45 | 123.7 | 3.87 |
| ESNet_x1_0 | 73.92 | 91.40 | 197.3 | 4.64 |
Please stay tuned for information such as Inference speed.

@ -0,0 +1,91 @@
# EfficientNet and ResNeXt101_wsl series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on V100 GPU](#3)
* [4. Inference speed based on T4 GPU](#4)
<a name='1'></a>
## 1. Overview
EfficientNet is a lightweight NAS-based network released by Google in 2019. EfficientNetB7 refreshed the classification accuracy of ImageNet-1k at that time. In this paper, the author points out that the traditional methods to improve the performance of neural networks mainly start with the width of the network, the depth of the network, and the resolution of the input picture.
However, the author found that balancing these three dimensions is essential for improving accuracy and efficiency through experiments.
Therefore, the author summarized how to balance the three dimensions at the same time through a series of experiments.
At the same time, based on this scaling method, the author built a total of 7 networks B1-B7 in the EfficientNet series on the basis of EfficientNetB0, and with the same FLOPs and parameters, the accuracy reached state-of-the-art effect.
ResNeXt is an improved version of ResNet that proposed by Facebook in 2016. In 2019, Facebook researchers studied the accuracy limit of the series network on ImageNet through weakly-supervised-learning. In order to distinguish the previous ResNeXt network, the suffix of this series network is WSL, where WSL is the abbreviation of weakly-supervised-learning. In order to have stronger feature extraction capability, the researchers further enlarged the network width, among which the largest ResNeXt101_32x48d_wsl has 800 million parameters. It was trained under 940 million weak-labeled images, and the results were finetune trained on imagenet-1k. Finally, the acc-1 of imagenet-1k reaches 85.4%, which is also the network with the highest precision under the resolution of 224x224 on imagenet-1k so far. In Fix-ResNeXt, the author used a larger image resolution, made a special Fix strategy for the inconsistency of image data preprocessing in training and testing, and made ResNeXt101_32x48d_wsl have a higher accuracy. Since it used the Fix strategy, it was named Fix-ResNeXt101_32x48d_wsl.
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png)
![](../../images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png)
At present, there are a total of 14 pretrained models of the two types of models that PaddleClas open source. It can be seen from the above figure that the advantages of the EfficientNet series network are very obvious. The ResNeXt101_wsl series model uses more data, and the final accuracy is also higher. EfficientNet_B0_small removes SE_block based on EfficientNet_B0, which has faster inference speed.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| ResNeXt101_<br>32x8d_wsl | 0.826 | 0.967 | 0.822 | 0.964 | 29.140 | 78.440 |
| ResNeXt101_<br>32x16d_wsl | 0.842 | 0.973 | 0.842 | 0.972 | 57.550 | 152.660 |
| ResNeXt101_<br>32x32d_wsl | 0.850 | 0.976 | 0.851 | 0.975 | 115.170 | 303.110 |
| ResNeXt101_<br>32x48d_wsl | 0.854 | 0.977 | 0.854 | 0.976 | 173.580 | 456.200 |
| Fix_ResNeXt101_<br>32x48d_wsl | 0.863 | 0.980 | 0.864 | 0.980 | 354.230 | 456.200 |
| EfficientNetB0 | 0.774 | 0.933 | 0.773 | 0.935 | 0.720 | 5.100 |
| EfficientNetB1 | 0.792 | 0.944 | 0.792 | 0.945 | 1.270 | 7.520 |
| EfficientNetB2 | 0.799 | 0.947 | 0.803 | 0.950 | 1.850 | 8.810 |
| EfficientNetB3 | 0.812 | 0.954 | 0.817 | 0.956 | 3.430 | 11.840 |
| EfficientNetB4 | 0.829 | 0.962 | 0.830 | 0.963 | 8.290 | 18.760 |
| EfficientNetB5 | 0.836 | 0.967 | 0.837 | 0.967 | 19.510 | 29.610 |
| EfficientNetB6 | 0.840 | 0.969 | 0.842 | 0.968 | 36.270 | 42.000 |
| EfficientNetB7 | 0.843 | 0.969 | 0.844 | 0.971 | 72.350 | 64.920 |
| EfficientNetB0_<br>small | 0.758 | 0.926 | | | 0.720 | 4.650 |
<a name='3'></a>
## 3. Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-------------------------------|-----------|-------------------|--------------------------|
| ResNeXt101_<br>32x8d_wsl | 224 | 256 | 19.127 |
| ResNeXt101_<br>32x16d_wsl | 224 | 256 | 23.629 |
| ResNeXt101_<br>32x32d_wsl | 224 | 256 | 40.214 |
| ResNeXt101_<br>32x48d_wsl | 224 | 256 | 59.714 |
| Fix_ResNeXt101_<br>32x48d_wsl | 320 | 320 | 82.431 |
| EfficientNetB0 | 224 | 256 | 2.449 |
| EfficientNetB1 | 240 | 272 | 3.547 |
| EfficientNetB2 | 260 | 292 | 3.908 |
| EfficientNetB3 | 300 | 332 | 5.145 |
| EfficientNetB4 | 380 | 412 | 7.609 |
| EfficientNetB5 | 456 | 488 | 12.078 |
| EfficientNetB6 | 528 | 560 | 18.381 |
| EfficientNetB7 | 600 | 632 | 27.817 |
| EfficientNetB0_<br>small | 224 | 256 | 1.692 |
<a name='4'></a>
## 4. Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| ResNeXt101_<br>32x8d_wsl | 224 | 256 | 18.19374 | 21.93529 | 34.67802 | 18.52528 | 34.25319 | 67.2283 |
| ResNeXt101_<br>32x16d_wsl | 224 | 256 | 18.52609 | 36.8288 | 62.79947 | 25.60395 | 71.88384 | 137.62327 |
| ResNeXt101_<br>32x32d_wsl | 224 | 256 | 33.51391 | 70.09682 | 125.81884 | 54.87396 | 160.04337 | 316.17718 |
| ResNeXt101_<br>32x48d_wsl | 224 | 256 | 50.97681 | 137.60926 | 190.82628 | 99.01698256 | 315.91261 | 551.83695 |
| Fix_ResNeXt101_<br>32x48d_wsl | 320 | 320 | 78.62869 | 191.76039 | 317.15436 | 160.0838242 | 595.99296 | 1151.47384 |
| EfficientNetB0 | 224 | 256 | 3.40122 | 5.95851 | 9.10801 | 3.442 | 6.11476 | 9.3304 |
| EfficientNetB1 | 240 | 272 | 5.25172 | 9.10233 | 14.11319 | 5.3322 | 9.41795 | 14.60388 |
| EfficientNetB2 | 260 | 292 | 5.91052 | 10.5898 | 17.38106 | 6.29351 | 10.95702 | 17.75308 |
| EfficientNetB3 | 300 | 332 | 7.69582 | 16.02548 | 27.4447 | 7.67749 | 16.53288 | 28.5939 |
| EfficientNetB4 | 380 | 412 | 11.55585 | 29.44261 | 53.97363 | 12.15894 | 30.94567 | 57.38511 |
| EfficientNetB5 | 456 | 488 | 19.63083 | 56.52299 | - | 20.48571 | 61.60252 | - |
| EfficientNetB6 | 528 | 560 | 30.05911 | - | - | 32.62402 | - | - |
| EfficientNetB7 | 600 | 632 | 47.86087 | - | - | 53.93823 | - | - |
| EfficientNetB0_small | 224 | 256 | 2.39166 | 4.36748 | 6.96002 | 2.3076 | 4.71886 | 7.21888 |

@ -0,0 +1,75 @@
# HRNet series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on V100 GPU](#3)
* [4. Inference speed based on T4 GPU](#4)
<a name='1'></a>
## 1. Overview
HRNet is a brand new neural network proposed by Microsoft research Asia in 2019. Different from the previous convolutional neural network, this network can still maintain high resolution in the deep layer of the network, so the heat map of the key points predicted is more accurate, and it is also more accurate in space. In addition, the network performs particularly well in other visual tasks sensitive to resolution, such as detection and segmentation.
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.HRNet.png)
At present, there are 7 pretrained models of such models open-sourced by PaddleClas, and their indicators are shown in the figure. Among them, the reason why the accuracy of the HRNet_W48_C indicator is abnormal may be due to fluctuations in training.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| HRNet_W18_C | 0.769 | 0.934 | 0.768 | 0.934 | 4.140 | 21.290 |
| HRNet_W18_C_ssld | 0.816 | 0.958 | 0.768 | 0.934 | 4.140 | 21.290 |
| HRNet_W30_C | 0.780 | 0.940 | 0.782 | 0.942 | 16.230 | 37.710 |
| HRNet_W32_C | 0.783 | 0.942 | 0.785 | 0.942 | 17.860 | 41.230 |
| HRNet_W40_C | 0.788 | 0.945 | 0.789 | 0.945 | 25.410 | 57.550 |
| HRNet_W44_C | 0.790 | 0.945 | 0.789 | 0.944 | 29.790 | 67.060 |
| HRNet_W48_C | 0.790 | 0.944 | 0.793 | 0.945 | 34.580 | 77.470 |
| HRNet_W48_C_ssld | 0.836 | 0.968 | 0.793 | 0.945 | 34.580 | 77.470 |
| HRNet_W64_C | 0.793 | 0.946 | 0.795 | 0.946 | 57.830 | 128.060 |
| SE_HRNet_W64_C_ssld | 0.847 | 0.973 | | | 57.830 | 128.970 |
<a name='3'></a>
## 3. Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-------------|-----------|-------------------|--------------------------|
| HRNet_W18_C | 224 | 256 | 7.368 |
| HRNet_W18_C_ssld | 224 | 256 | 7.368 |
| HRNet_W30_C | 224 | 256 | 9.402 |
| HRNet_W32_C | 224 | 256 | 9.467 |
| HRNet_W40_C | 224 | 256 | 10.739 |
| HRNet_W44_C | 224 | 256 | 11.497 |
| HRNet_W48_C | 224 | 256 | 12.165 |
| HRNet_W48_C_ssld | 224 | 256 | 12.165 |
| HRNet_W64_C | 224 | 256 | 15.003 |
<a name='4'></a>
## 4. Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| HRNet_W18_C | 224 | 256 | 6.79093 | 11.50986 | 17.67244 | 7.40636 | 13.29752 | 23.33445 |
| HRNet_W18_C_ssld | 224 | 256 | 6.79093 | 11.50986 | 17.67244 | 7.40636 | 13.29752 | 23.33445 |
| HRNet_W30_C | 224 | 256 | 8.98077 | 14.08082 | 21.23527 | 9.57594 | 17.35485 | 32.6933 |
| HRNet_W32_C | 224 | 256 | 8.82415 | 14.21462 | 21.19804 | 9.49807 | 17.72921 | 32.96305 |
| HRNet_W40_C | 224 | 256 | 11.4229 | 19.1595 | 30.47984 | 12.12202 | 25.68184 | 48.90623 |
| HRNet_W44_C | 224 | 256 | 12.25778 | 22.75456 | 32.61275 | 13.19858 | 32.25202 | 59.09871 |
| HRNet_W48_C | 224 | 256 | 12.65015 | 23.12886 | 33.37859 | 13.70761 | 34.43572 | 63.01219 |
| HRNet_W48_C_ssld | 224 | 256 | 12.65015 | 23.12886 | 33.37859 | 13.70761 | 34.43572 | 63.01219 |
| HRNet_W64_C | 224 | 256 | 15.10428 | 27.68901 | 40.4198 | 17.57527 | 47.9533 | 97.11228 |
| SE_HRNet_W64_C_ssld | 224 | 256 | 32.33651 | 69.31189 | 116.07245 | 31.69770 | 94.99546 | 174.45766 |

@ -0,0 +1,21 @@
# HarDNet series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
HarDNetHarmonic DenseNetis a brand new neural network proposed by National Tsing Hua University in 2019, which to achieve high efficiency in terms of both low MACs and memory traffic. The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. We use tools including Nvidia profiler and ARM Scale-Sim to measure the memory traffic and verify that the inference latency is indeed proportional to the memory traffic consumption and the proposed network consumes low memory traffic. [Paper](https://arxiv.org/abs/1909.00948).
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
| HarDNet68 | 17.6 | 4.3 | 75.46 | 92.65 |
| HarDNet85 | 36.7 | 9.1 | 77.44 | 93.55 |
| HarDNet39_ds | 3.5 | 0.4 | 71.33 | 89.98 |
| HarDNet68_ds | 4.2 | 0.8 | 73.62 | 91.52 |

@ -0,0 +1,74 @@
# Inception series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on V100 GPU](#3)
* [4. Inference speed based on T4 GPU](#4)
<a name='1'></a>
## 1. Overview
GoogLeNet is a new neural network structure designed by Google in 2014, which, together with VGG network, became the twin champions of the ImageNet challenge that year. GoogLeNet introduces the Inception structure for the first time, and stacks the Inception structure in the network so that the number of network layers reaches 22, which is also the mark of the convolutional network exceeding 20 layers for the first time. Since 1x1 convolution is used in the Inception structure to reduce the dimension of channel number, and Global pooling is used to replace the traditional method of processing features in multiple fc layers, the final GoogLeNet network has much less FLOPs and parameters than VGG network, which has become a beautiful scenery of neural network design at that time.
InceptionV3 is an improvement of InceptionV2 by Google. First of all, the author optimized the Inception module in InceptionV3. At the same time, more types of Inception modules were designed and used. Further, the larger square two-dimensional convolution kernel in some Inception modules in InceptionV3 was disassembled into two smaller asymmetric convolution kernels, which can greatly save the amount of parameters.
Xception is another improvement to InceptionV3 that Google proposed after Inception. In Xception, the author used the depthwise separable convolution to replace the traditional convolution operation, which greatly saved the network FLOPs and the number of parameters, but improved the accuracy. In DeeplabV3+, the author further improved the Xception and increased the number of Xception layers, and designed the network of Xception65 and Xception71.
InceptionV4 is a new neural network designed by Google in 2016, when residual structure were all the rage, but the authors believe that high performance can be achieved using only Inception structure. InceptionV4 uses more Inception structure to achieve even greater precision on Imagenet-1k.
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.Inception.png)
The figure above reflects the relationship between the accuracy of Xception series and InceptionV4 and other indicators. Among them, Xception_deeplab is consistent with the structure of the paper, and Xception is an improved model developed by PaddleClas, which improves the accuracy by about 0.6% when the inference speed is basically unchanged. Details of the improved model are being updated, so stay tuned.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| GoogLeNet | 0.707 | 0.897 | 0.698 | | 2.880 | 8.460 |
| Xception41 | 0.793 | 0.945 | 0.790 | 0.945 | 16.740 | 22.690 |
| Xception41<br>_deeplab | 0.796 | 0.944 | | | 18.160 | 26.730 |
| Xception65 | 0.810 | 0.955 | | | 25.950 | 35.480 |
| Xception65<br>_deeplab | 0.803 | 0.945 | | | 27.370 | 39.520 |
| Xception71 | 0.811 | 0.955 | | | 31.770 | 37.280 |
| InceptionV3 | 0.791 | 0.946 | 0.788 | 0.944 | 11.460 | 23.830 |
| InceptionV4 | 0.808 | 0.953 | 0.800 | 0.950 | 24.570 | 42.680 |
<a name='3'></a>
## 3. Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|------------------------|-----------|-------------------|--------------------------|
| GoogLeNet | 224 | 256 | 1.807 |
| Xception41 | 299 | 320 | 3.972 |
| Xception41_<br>deeplab | 299 | 320 | 4.408 |
| Xception65 | 299 | 320 | 6.174 |
| Xception65_<br>deeplab | 299 | 320 | 6.464 |
| Xception71 | 299 | 320 | 6.782 |
| InceptionV4 | 299 | 320 | 11.141 |
<a name='4'></a>
## 4. Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| GoogLeNet | 299 | 320 | 1.75451 | 3.39931 | 4.71909 | 1.88038 | 4.48882 | 6.94035 |
| Xception41 | 299 | 320 | 2.91192 | 7.86878 | 15.53685 | 4.96939 | 17.01361 | 32.67831 |
| Xception41_<br>deeplab | 299 | 320 | 2.85934 | 7.2075 | 14.01406 | 5.33541 | 17.55938 | 33.76232 |
| Xception65 | 299 | 320 | 4.30126 | 11.58371 | 23.22213 | 7.26158 | 25.88778 | 53.45426 |
| Xception65_<br>deeplab | 299 | 320 | 4.06803 | 9.72694 | 19.477 | 7.60208 | 26.03699 | 54.74724 |
| Xception71 | 299 | 320 | 4.80889 | 13.5624 | 27.18822 | 8.72457 | 31.55549 | 69.31018 |
| InceptionV3 | 299 | 320 | 3.67502 | 6.36071 | 9.82645 | 6.64054 | 13.53630 | 22.17355 |
| InceptionV4 | 299 | 320 | 9.50821 | 13.72104 | 20.27447 | 12.99342 | 25.23416 | 43.56121 |

@ -0,0 +1,24 @@
# LeViT series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
LeViT is a fast inference hybrid neural network for image classification tasks. Its design considers the performance of the network model on different hardware platforms, so it can better reflect the real scenarios of common applications. Through a large number of experiments, the author found a better way to combine the convolutional neural network and the Transformer system, and proposed an attention-based method to integrate the position information encoding in the Transformer. [Paper](https://arxiv.org/abs/2104.01136)。
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(M) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| LeViT-128S | 0.7598 | 0.9269 | 0.766 | 0.929 | 305 | 7.8 |
| LeViT-128 | 0.7810 | 0.9371 | 0.786 | 0.940 | 406 | 9.2 |
| LeViT-192 | 0.7934 | 0.9446 | 0.800 | 0.947 | 658 | 11 |
| LeViT-256 | 0.8085 | 0.9497 | 0.816 | 0.954 | 1120 | 19 |
| LeViT-384 | 0.8191 | 0.9551 | 0.826 | 0.960 | 2353 | 39 |
**Note**The difference in accuracy from Reference is due to the difference in data preprocessing and the absence of distilled head as output.

@ -0,0 +1,27 @@
# MixNet series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
MixNet is a lightweight network proposed by Google. The main idea of MixNet is to explore the combination of different size of kernels. The author found that the current network has the following two problems:
- Small convolution kernel has small receptive field and few parameters, but the accuracy is not high.
- The larger convolution kernel has larger receptive field and higher accuracy, but the parameters also increase a lot .
In order to solve the above two problems, MDConv(mixed depthwise convolution) is proposed. In this method, different size of kernels are mixed in a convolution operation block. And based on AutoML, a series of networks called MixNets are proposed, which have achieved good results on Imagenet. [paper](https://arxiv.org/pdf/1907.09595.pdf)
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | FLOPs<br>(M) | Params<br/>(G |
| :------: | :---: | :---: | :---------------: | :----------: | ------------- |
| MixNet_S | 76.28 | 92.99 | 75.8 | 252.977 | 4.167 |
| MixNet_M | 77.67 | 93.64 | 77.0 | 357.119 | 5.065 |
| MixNet_L | 78.60 | 94.37 | 78.9 | 579.017 | 7.384 |
Inference speed and other information are coming soon.

@ -0,0 +1,158 @@
# Mobile and Embedded Vision Applications Network series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed and storage size based on SD855](#3)
* [4. Inference speed based on T4 GPU](#4)
<a name='1'></a>
## 1. Overview
MobileNetV1 is a network launched by Google in 2017 for use on mobile devices or embedded devices. The network replaces the depthwise separable convolution with the traditional convolution operation, that is, the combination of depthwise convolution and pointwise convolution. Compared with the traditional convolution operation, this combination can greatly save the number of parameters and computation. At the same time, MobileNetV1 can also be used for object detection, image segmentation and other visual tasks.
MobileNetV2 is a lightweight network proposed by Google following MobileNetV1. Compared with MobileNetV1, MobileNetV2 proposed Linear bottlenecks and Inverted residual block as a basic network structures, to constitute MobileNetV2 network architecture through stacking these basic module a lot. In the end, higher classification accuracy was achieved when FLOPs was only half of MobileNetV1.
The ShuffleNet series network is the lightweight network structure proposed by MEGVII. So far, there are two typical structures in this series network, namely, ShuffleNetV1 and ShuffleNetV2. A Channel Shuffle operation in ShuffleNet can exchange information between groups and perform end-to-end training. In the paper of ShuffleNetV2, the author proposes four criteria for designing lightweight networks, and designs the ShuffleNetV2 network according to the four criteria and the shortcomings of ShuffleNetV1.
MobileNetV3 is a new and lightweight network based on NAS proposed by Google in 2019. In order to further improve the effect, the activation functions of relu and sigmoid were replaced with hard_swish and hard_sigmoid activation functions, and some improved strategies were introduced to reduce the amount of network computing.
GhosttNet is a brand-new lightweight network structure proposed by Huawei in 2020. By introducing the ghost module, the problem of redundant calculation of features in traditional deep networks is greatly alleviated, which greatly reduces the amount of network parameters and calculations.
![](../../images/models/mobile_arm_top1.png)
![](../../images/models/mobile_arm_storage.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png)
Currently there are 32 pretrained models of the mobile series open source by PaddleClas, and their indicators are shown in the figure below. As you can see from the picture, newer lightweight models tend to perform better, and MobileNetV3 represents the latest lightweight neural network architecture. In MobileNetV3, the author used 1x1 convolution after global-avg-pooling in order to obtain higher accuracy,this operation significantly increases the number of parameters but has little impact on the amount of computation, so if the model is evaluated from a storage perspective of excellence, MobileNetV3 does not have much advantage, but because of its smaller computation, it has a faster inference speed. In addition, the SSLD distillation model in our model library performs excellently, refreshing the accuracy of the current lightweight model from various perspectives. Due to the complex structure and many branches of the MobileNetV3 model, which is not GPU friendly, the GPU inference speed is not as good as that of MobileNetV1.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| MobileNetV1_x0_25 | 0.514 | 0.755 | 0.506 | | 0.070 | 0.460 |
| MobileNetV1_x0_5 | 0.635 | 0.847 | 0.637 | | 0.280 | 1.310 |
| MobileNetV1_x0_75 | 0.688 | 0.882 | 0.684 | | 0.630 | 2.550 |
| MobileNetV1 | 0.710 | 0.897 | 0.706 | | 1.110 | 4.190 |
| MobileNetV1_ssld | 0.779 | 0.939 | | | 1.110 | 4.190 |
| MobileNetV2_x0_25 | 0.532 | 0.765 | | | 0.050 | 1.500 |
| MobileNetV2_x0_5 | 0.650 | 0.857 | 0.654 | 0.864 | 0.170 | 1.930 |
| MobileNetV2_x0_75 | 0.698 | 0.890 | 0.698 | 0.896 | 0.350 | 2.580 |
| MobileNetV2 | 0.722 | 0.907 | 0.718 | 0.910 | 0.600 | 3.440 |
| MobileNetV2_x1_5 | 0.741 | 0.917 | | | 1.320 | 6.760 |
| MobileNetV2_x2_0 | 0.752 | 0.926 | | | 2.320 | 11.130 |
| MobileNetV2_ssld | 0.7674 | 0.9339 | | | 0.600 | 3.440 |
| MobileNetV3_large_<br>x1_25 | 0.764 | 0.930 | 0.766 | | 0.714 | 7.440 |
| MobileNetV3_large_<br>x1_0 | 0.753 | 0.923 | 0.752 | | 0.450 | 5.470 |
| MobileNetV3_large_<br>x0_75 | 0.731 | 0.911 | 0.733 | | 0.296 | 3.910 |
| MobileNetV3_large_<br>x0_5 | 0.692 | 0.885 | 0.688 | | 0.138 | 2.670 |
| MobileNetV3_large_<br>x0_35 | 0.643 | 0.855 | 0.642 | | 0.077 | 2.100 |
| MobileNetV3_small_<br>x1_25 | 0.707 | 0.895 | 0.704 | | 0.195 | 3.620 |
| MobileNetV3_small_<br>x1_0 | 0.682 | 0.881 | 0.675 | | 0.123 | 2.940 |
| MobileNetV3_small_<br>x0_75 | 0.660 | 0.863 | 0.654 | | 0.088 | 2.370 |
| MobileNetV3_small_<br>x0_5 | 0.592 | 0.815 | 0.580 | | 0.043 | 1.900 |
| MobileNetV3_small_<br>x0_35 | 0.530 | 0.764 | 0.498 | | 0.026 | 1.660 |
| MobileNetV3_small_<br>x0_35_ssld | 0.556 | 0.777 | 0.498 | | 0.026 | 1.660 |
| MobileNetV3_large_<br>x1_0_ssld | 0.790 | 0.945 | | | 0.450 | 5.470 |
| MobileNetV3_large_<br>x1_0_ssld_int8 | 0.761 | | | | | |
| MobileNetV3_small_<br>x1_0_ssld | 0.713 | 0.901 | | | 0.123 | 2.940 |
| ShuffleNetV2 | 0.688 | 0.885 | 0.694 | | 0.280 | 2.260 |
| ShuffleNetV2_x0_25 | 0.499 | 0.738 | | | 0.030 | 0.600 |
| ShuffleNetV2_x0_33 | 0.537 | 0.771 | | | 0.040 | 0.640 |
| ShuffleNetV2_x0_5 | 0.603 | 0.823 | 0.603 | | 0.080 | 1.360 |
| ShuffleNetV2_x1_5 | 0.716 | 0.902 | 0.726 | | 0.580 | 3.470 |
| ShuffleNetV2_x2_0 | 0.732 | 0.912 | 0.749 | | 1.120 | 7.320 |
| ShuffleNetV2_swish | 0.700 | 0.892 | | | 0.290 | 2.260 |
| GhostNet_x0_5 | 0.668 | 0.869 | 0.662 | 0.866 | 0.082 | 2.600 |
| GhostNet_x1_0 | 0.740 | 0.916 | 0.739 | 0.914 | 0.294 | 5.200 |
| GhostNet_x1_3 | 0.757 | 0.925 | 0.757 | 0.927 | 0.440 | 7.300 |
| GhostNet_x1_3_ssld | 0.794 | 0.945 | 0.757 | 0.927 | 0.440 | 7.300 |
<a name='3'></a>
## 3. Inference speed and storage size based on SD855
| Models | Batch Size=1(ms) | Storage Size(M) |
|:--:|:--:|:--:|
| MobileNetV1_x0_25 | 3.220 | 1.900 |
| MobileNetV1_x0_5 | 9.580 | 5.200 |
| MobileNetV1_x0_75 | 19.436 | 10.000 |
| MobileNetV1 | 32.523 | 16.000 |
| MobileNetV1_ssld | 32.523 | 16.000 |
| MobileNetV2_x0_25 | 3.799 | 6.100 |
| MobileNetV2_x0_5 | 8.702 | 7.800 |
| MobileNetV2_x0_75 | 15.531 | 10.000 |
| MobileNetV2 | 23.318 | 14.000 |
| MobileNetV2_x1_5 | 45.624 | 26.000 |
| MobileNetV2_x2_0 | 74.292 | 43.000 |
| MobileNetV2_ssld | 23.318 | 14.000 |
| MobileNetV3_large_x1_25 | 28.218 | 29.000 |
| MobileNetV3_large_x1_0 | 19.308 | 21.000 |
| MobileNetV3_large_x0_75 | 13.565 | 16.000 |
| MobileNetV3_large_x0_5 | 7.493 | 11.000 |
| MobileNetV3_large_x0_35 | 5.137 | 8.600 |
| MobileNetV3_small_x1_25 | 9.275 | 14.000 |
| MobileNetV3_small_x1_0 | 6.546 | 12.000 |
| MobileNetV3_small_x0_75 | 5.284 | 9.600 |
| MobileNetV3_small_x0_5 | 3.352 | 7.800 |
| MobileNetV3_small_x0_35 | 2.635 | 6.900 |
| MobileNetV3_small_x0_35_ssld | 2.635 | 6.900 |
| MobileNetV3_large_x1_0_ssld | 19.308 | 21.000 |
| MobileNetV3_large_x1_0_ssld_int8 | 14.395 | 10.000 |
| MobileNetV3_small_x1_0_ssld | 6.546 | 12.000 |
| ShuffleNetV2 | 10.941 | 9.000 |
| ShuffleNetV2_x0_25 | 2.329 | 2.700 |
| ShuffleNetV2_x0_33 | 2.643 | 2.800 |
| ShuffleNetV2_x0_5 | 4.261 | 5.600 |
| ShuffleNetV2_x1_5 | 19.352 | 14.000 |
| ShuffleNetV2_x2_0 | 34.770 | 28.000 |
| ShuffleNetV2_swish | 16.023 | 9.100 |
| GhostNet_x0_5 | 5.714 | 10.000 |
| GhostNet_x1_0 | 13.558 | 20.000 |
| GhostNet_x1_3 | 19.982 | 29.000 |
| GhostNet_x1_3_ssld | 19.982 | 29.000 |
<a name='4'></a>
## 4. Inference speed based on T4 GPU
| Models | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
| MobileNetV1_x0_25 | 0.68422 | 1.13021 | 1.72095 | 0.67274 | 1.226 | 1.84096 |
| MobileNetV1_x0_5 | 0.69326 | 1.09027 | 1.84746 | 0.69947 | 1.43045 | 2.39353 |
| MobileNetV1_x0_75 | 0.6793 | 1.29524 | 2.15495 | 0.79844 | 1.86205 | 3.064 |
| MobileNetV1 | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 |
| MobileNetV1_ssld | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 |
| MobileNetV2_x0_25 | 2.85399 | 3.62405 | 4.29952 | 2.81989 | 3.52695 | 4.2432 |
| MobileNetV2_x0_5 | 2.84258 | 3.1511 | 4.10267 | 2.80264 | 3.65284 | 4.31737 |
| MobileNetV2_x0_75 | 2.82183 | 3.27622 | 4.98161 | 2.86538 | 3.55198 | 5.10678 |
| MobileNetV2 | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 |
| MobileNetV2_x1_5 | 2.81852 | 4.87434 | 8.97934 | 2.79398 | 5.30149 | 9.30899 |
| MobileNetV2_x2_0 | 3.65197 | 6.32329 | 11.644 | 3.29788 | 7.08644 | 12.45375 |
| MobileNetV2_ssld | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 |
| MobileNetV3_large_x1_25 | 2.34387 | 3.16103 | 4.79742 | 2.35117 | 3.44903 | 5.45658 |
| MobileNetV3_large_x1_0 | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 |
| MobileNetV3_large_x0_75 | 2.1058 | 2.61426 | 3.61021 | 2.0006 | 2.56987 | 3.78005 |
| MobileNetV3_large_x0_5 | 2.06934 | 2.77341 | 3.35313 | 2.11199 | 2.88172 | 3.19029 |
| MobileNetV3_large_x0_35 | 2.14965 | 2.7868 | 3.36145 | 1.9041 | 2.62951 | 3.26036 |
| MobileNetV3_small_x1_25 | 2.06817 | 2.90193 | 3.5245 | 2.02916 | 2.91866 | 3.34528 |
| MobileNetV3_small_x1_0 | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 |
| MobileNetV3_small_x0_75 | 1.80617 | 2.64646 | 3.24513 | 1.93697 | 2.64285 | 3.32797 |
| MobileNetV3_small_x0_5 | 1.95001 | 2.74014 | 3.39485 | 1.88406 | 2.99601 | 3.3908 |
| MobileNetV3_small_x0_35 | 2.10683 | 2.94267 | 3.44254 | 1.94427 | 2.94116 | 3.41082 |
| MobileNetV3_small_x0_35_ssld | 2.10683 | 2.94267 | 3.44254 | 1.94427 | 2.94116 | 3.41082 |
| MobileNetV3_large_x1_0_ssld | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 |
| MobileNetV3_small_x1_0_ssld | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 |
| ShuffleNetV2 | 1.95064 | 2.15928 | 2.97169 | 1.89436 | 2.26339 | 3.17615 |
| ShuffleNetV2_x0_25 | 1.43242 | 2.38172 | 2.96768 | 1.48698 | 2.29085 | 2.90284 |
| ShuffleNetV2_x0_33 | 1.69008 | 2.65706 | 2.97373 | 1.75526 | 2.85557 | 3.09688 |
| ShuffleNetV2_x0_5 | 1.48073 | 2.28174 | 2.85436 | 1.59055 | 2.18708 | 3.09141 |
| ShuffleNetV2_x1_5 | 1.51054 | 2.4565 | 3.41738 | 1.45389 | 2.5203 | 3.99872 |
| ShuffleNetV2_x2_0 | 1.95616 | 2.44751 | 4.19173 | 2.15654 | 3.18247 | 5.46893 |
| ShuffleNetV2_swish | 2.50213 | 2.92881 | 3.474 | 2.5129 | 2.97422 | 3.69357 |
| GhostNet_x0_5 | 2.64492 | 3.48473 | 4.48844 | 2.36115 | 3.52802 | 3.89444 |
| GhostNet_x1_0 | 2.63120 | 3.92065 | 4.48296 | 2.57042 | 3.56296 | 4.85524 |
| GhostNet_x1_3 | 2.89715 | 3.80329 | 4.81661 | 2.81810 | 3.72071 | 5.92269 |

@ -0,0 +1,64 @@
# Other networks
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed and storage size based on SD855](#3)
* [4. Inference speed based on T4 GPU](#4)
<a name='1'></a>
## 1. Overview
In 2012, AlexNet network proposed by Alex et al. won the ImageNet competition by far surpassing the second place, and the convolutional neural network and even deep learning attracted wide attention. AlexNet used relu as the activation function of CNN to solve the gradient dispersion problem of sigmoid when the network is deep. During the training, Dropout was used to randomly lose a part of the neurons, avoiding the overfitting of the model. In the network, overlapping maximum pooling is used to replace the average pooling commonly used in CNN, which avoids the fuzzy effect of average pooling and improves the feature richness. In a sense, AlexNet has exploded the research and application of neural networks.
SqueezeNet achieved the same precision as AlexNet on Imagenet-1k, but only with 1/50 parameters. The core of the network is the Fire module, which used the convolution of 1x1 to achieve channel dimensionality reduction, thus greatly saving the number of parameters. The author created SqueezeNet by stacking a large number of Fire modules.
VGG is a convolutional neural network developed by researchers at Oxford University's Visual Geometry Group and DeepMind. The network explores the relationship between the depth of the convolutional neural network and its performance. By repeatedly stacking the small convolutional kernel of 3x3 and the maximum pooling layer of 2x2, the multi-layer convolutional neural network is successfully constructed and has achieved good convergence accuracy. In the end, VGG won the runner-up of ILSVRC 2014 classification and the champion of positioning.
DarkNet53 is designed for object detection by YOLO author in the paper. The network is basically composed of 1x1 and 3x3 kernel, with a total of 53 layers, named DarkNet53.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| AlexNet | 0.567 | 0.792 | 0.5720 | | 1.370 | 61.090 |
| SqueezeNet1_0 | 0.596 | 0.817 | 0.575 | | 1.550 | 1.240 |
| SqueezeNet1_1 | 0.601 | 0.819 | | | 0.690 | 1.230 |
| VGG11 | 0.693 | 0.891 | | | 15.090 | 132.850 |
| VGG13 | 0.700 | 0.894 | | | 22.480 | 133.030 |
| VGG16 | 0.720 | 0.907 | 0.715 | 0.901 | 30.810 | 138.340 |
| VGG19 | 0.726 | 0.909 | | | 39.130 | 143.650 |
| DarkNet53 | 0.780 | 0.941 | 0.772 | 0.938 | 18.580 | 41.600 |
<a name='3'></a>
## 3. Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|---------------------------|-----------|-------------------|----------------------|
| AlexNet | 224 | 256 | 1.176 |
| SqueezeNet1_0 | 224 | 256 | 0.860 |
| SqueezeNet1_1 | 224 | 256 | 0.763 |
| VGG11 | 224 | 256 | 1.867 |
| VGG13 | 224 | 256 | 2.148 |
| VGG16 | 224 | 256 | 2.616 |
| VGG19 | 224 | 256 | 3.076 |
| DarkNet53 | 256 | 256 | 3.139 |
<a name='4'></a>
## 4. Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| AlexNet | 224 | 256 | 1.06447 | 1.70435 | 2.38402 | 1.44993 | 2.46696 | 3.72085 |
| SqueezeNet1_0 | 224 | 256 | 0.97162 | 2.06719 | 3.67499 | 0.96736 | 2.53221 | 4.54047 |
| SqueezeNet1_1 | 224 | 256 | 0.81378 | 1.62919 | 2.68044 | 0.76032 | 1.877 | 3.15298 |
| VGG11 | 224 | 256 | 2.24408 | 4.67794 | 7.6568 | 3.90412 | 9.51147 | 17.14168 |
| VGG13 | 224 | 256 | 2.58589 | 5.82708 | 10.03591 | 4.64684 | 12.61558 | 23.70015 |
| VGG16 | 224 | 256 | 3.13237 | 7.19257 | 12.50913 | 5.61769 | 16.40064 | 32.03939 |
| VGG19 | 224 | 256 | 3.69987 | 8.59168 | 15.07866 | 6.65221 | 20.4334 | 41.55902 |
| DarkNet53 | 256 | 256 | 3.18101 | 5.88419 | 10.14964 | 4.10829 | 12.1714 | 22.15266 |

@ -0,0 +1,169 @@
# PP-LCNet Series
---
## Catalogue
- [1. Abstract](#1)
- [2. Introduction](#2)
- [3. Method](#3)
- [3.1 Better Activation Function](#3.1)
- [3.2 SE Modules at Appropriate Positions](#3.2)
- [3.3 Larger Convolution Kernels](#3.3)
- [3.4 Larger Dimensional 1 × 1 Conv Layer after GAP](#3.4)
- [4. Experiments](#4)
- [4.1 Image Classification](#4.1)
- [4.2 Object Detection](#4.2)
- [4.3 Semantic Segmentation](#4.3)
- [5. Conclusion](#5)
- [6. Reference](#6)
<a name="1"></a>
## 1. Abstract
In the field of computer vision, the quality of backbone network determines the outcome of the whole vision task. In previous studies, researchers generally focus on the optimization of FLOPs or Params, but inference speed actually serves as an importance indicator of model quality in real-world scenarios. Nevertheless, it is difficult to balance inference speed and accuracy. In view of various CPU-based applications in industry, we are now working to raise the adaptability of the backbone network to Intel CPU, so as to obtain a faster and more accurate lightweight backbone network. At the same time, the performance of downstream vision tasks such as object detection and semantic segmentation are also improved.
<a name="2"></a>
## 2. Introduction
Recent years witnessed the emergence of many lightweight backbone networks. In past two years, in particular, there were abundant networks searched by NAS that either enjoy advantages on FLOPs or Params, or have an edge in terms of inference speed on ARM devices. However, few of them dedicated to specified optimization of Intel CPU, resulting their imperfect inference speed on the intel CPU side. Based on this, we specially design the backbone network PP-LCNet for Intel CPU devices with its acceleration library MKLDNN. Compared with other lightweight SOTA models, this backbone network can further improve the performance of the model without increasing the inference time, significantly outperforming the existing SOTA models. A comparison chart with other models is shown below.
![](../../images/PP-LCNet/PP-LCNet-Acc.png)
<a name="3"></a>
## 3. Method
The overall structure of the network is shown in the figure below.
![](../../images/PP-LCNet/PP-LCNet.png)
Build on extensive experiments, we found that many seemingly less time-consuming operations will increase the latency on Intel CPU-based devices, especially when the MKLDNN acceleration library is enabled. Therefore, we finally chose a block with the leanest possible structure and the fastest possible speed to form our BaseNet (similar to MobileNetV1). Based on BaseNet, we summarized four strategies that can improve the accuracy of the model without increasing the latency, and we combined these four strategies to form PP-LCNet. Each of these four strategies is introduced as below:
<a name="3.1"></a>
### 3.1 Better Activation Function
Since the adoption of ReLU activation function by convolutional neural network, the network performance has been improved substantially, and variants of the ReLU activation function have appeared in recent years, such as Leaky-ReLU, P-ReLU, ELU, etc. In 2017, Google Brain searched to obtain the swish activation function, which performs well on lightweight networks. In 2019, the authors of MobileNetV3 further optimized this activation function to H-Swish, which removes the exponential operation, leading to faster speed and an almost unaffected network accuracy. After many experiments, we also recognized its excellent performance on lightweight networks. Therefore, this activation function is adopted in PP-LCNet.
<a name="3.2"></a>
### 3.2 SE Modules at Appropriate Positions
The SE module is a channel attention mechanism proposed by SENet, which can effectively improve the accuracy of the model. However, on the Intel CPU side, the module also presents a large latency, leaving us the task of balancing accuracy and speed. The search of the location of the SE module in NAS search-based networks such as MobileNetV3 brings no general conclusions, but we found through our experiments that the closer the SE module is to the tail of the network the greater the improvement in model accuracy. The following table also shows some of our experimental results
| SE Location | Top-1 Acc(\%) | Latency(ms) |
|-------------------|---------------|-------------|
| 1100000000000 | 61.73 | 2.06 |
| 0000001100000 | 62.17 | 2.03 |
| <b>0000000000011<b> | <b>63.14<b> | <b>2.05<b> |
| 1111111111111 | 64.27 | 3.80 |
The option in the third row of the table was chosen for the location of the SE module in PP-LCNet.
<a name="3.3"></a>
### 3.3 Larger Convolution Kernels
In the paper of MixNet, the author analyzes the effect of convolutional kernel size on model performance and concludes that larger convolutional kernels within a certain range can improve the performance of the model, but beyond this range will be detrimental to the models performance. So the author forms MixConv with split-concat paradigm combined, which can improve the performance of the model but is not conducive to inference. We experimentally summarize the role of some larger convolutional kernels at different positions that are similar to those of the SE module, and find that larger convolutional kernels display more prominent roles in the middle and tail of the network. The following table shows the effect of the position of the 5x5 convolutional kernels on the accuracy
| Larger Convolution Location | Top-1 Acc(\%) | Latency(ms) |
|----------------------------|---------------|-------------|
| 1111111111111 | 63.22 | 2.08 |
| 1111111000000 | 62.70 | 2.07 |
| <b>0000001111111<b> | <b>63.14<b> | <b>2.05<b> |
Experiments show that a larger convolutional kernel placed at the middle and tail of the network can achieve the same accuracy as placed at all positions, coupled with faster inference. The option in the third row of the table was the final choice of PP-LCNet.
<a name="3.4"></a>
### 3.4 Larger Dimensional 1 × 1 Conv Layer after GAP
Since the introduction of GoogLeNet, GAP (Global-Average-Pooling) is often directly followed by a classification layer, which fails to result in further integration and processing of features extracted after GAP in the lightweight network. If a larger 1x1 convolutional layer (equivalent to the FC layer) is used after GAP, the extracted features, instead of directly passing through the classification layer, will first be integrated, and then classified. This can greatly improve the accuracy rate without affecting the inference speed of the model. The above four improvements were made to BaseNet to obtain PP-LCNet. The following table further illustrates the impact of each scheme on the results
| Activation | SE-block | Large-kernal | last-1x1-conv | Top-1 Acc(\%) | Latency(ms) |
|------------|----------|--------------|---------------|---------------|-------------|
| 0 | 1 | 1 | 1 | 61.93 | 1.94 |
| 1 | 0 | 1 | 1 | 62.51 | 1.87 |
| 1 | 1 | 0 | 1 | 62.44 | 2.01 |
| 1 | 1 | 1 | 0 | 59.91 | 1.85 |
| <b>1<b> | <b>1<b> | <b>1<b> | <b>1<b> | <b>63.14<b> | <b>2.05<b> |
<a name="4"></a>
## 4. Experiments
<a name="4.1"></a>
### 4.1 Image Classification
For image classification, ImageNet dataset is adopted. Compared with the current mainstream lightweight network, PP-LCNet can obtain faster inference speed with the same accuracy. When using Baidus self-developed SSLD distillation strategy, the accuracy is further improved, with the Top-1 Acc of ImageNet exceeding 80% at an inference speed of about 5ms on the Intel CPU side.
| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
|-------|-----------|----------|---------------|---------------|-------------|
| PP-LCNet-0.25x | 1.5 | 18 | 51.86 | 75.65 | 1.74 |
| PP-LCNet-0.35x | 1.6 | 29 | 58.09 | 80.83 | 1.92 |
| PP-LCNet-0.5x | 1.9 | 47 | 63.14 | 84.66 | 2.05 |
| PP-LCNet-0.75x | 2.4 | 99 | 68.18 | 88.30 | 2.29 |
| PP-LCNet-1x | 3.0 | 161 | 71.32 | 90.03 | 2.46 |
| PP-LCNet-1.5x | 4.5 | 342 | 73.71 | 91.53 | 3.19 |
| PP-LCNet-2x | 6.5 | 590 | 75.18 | 92.27 | 4.27 |
| PP-LCNet-2.5x | 9.0 | 906 | 76.60 | 93.00 | 5.39 |
| PP-LCNet-0.5x\* | 1.9 | 47 | 66.10 | 86.46 | 2.05 |
| PP-LCNet-1.0x\* | 3.0 | 161 | 74.39 | 92.09 | 2.46 |
| PP-LCNet-2.5x\* | 9.0 | 906 | 80.82 | 95.33 | 5.39 |
\* denotes the model after using SSLD distillation.
Performance comparison with other lightweight networks:
| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) |
|-------|-----------|----------|---------------|---------------|-------------|
| MobileNetV2-0.25x | 1.5 | 34 | 53.21 | 76.52 | 2.47 |
| MobileNetV3-small-0.35x | 1.7 | 15 | 53.03 | 76.37 | 3.02 |
| ShuffleNetV2-0.33x | 0.6 | 24 | 53.73 | 77.05 | 4.30 |
| <b>PP-LCNet-0.25x<b> | <b>1.5<b> | <b>18<b> | <b>51.86<b> | <b>75.65<b> | <b>1.74<b> |
| MobileNetV2-0.5x | 2.0 | 99 | 65.03 | 85.72 | 2.85 |
| MobileNetV3-large-0.35x | 2.1 | 41 | 64.32 | 85.46 | 3.68 |
| ShuffleNetV2-0.5x | 1.4 | 43 | 60.32 | 82.26 | 4.65 |
| <b>PP-LCNet-0.5x<b> | <b>1.9<b> | <b>47<b> | <b>63.14<b> | <b>84.66<b> | <b>2.05<b> |
| MobileNetV1-1x | 4.3 | 578 | 70.99 | 89.68 | 3.38 |
| MobileNetV2-1x | 3.5 | 327 | 72.15 | 90.65 | 4.26 |
| MobileNetV3-small-1.25x | 3.6 | 100 | 70.67 | 89.51 | 3.95 |
| <b>PP-LCNet-1x<b> |<b> 3.0<b> | <b>161<b> | <b>71.32<b> | <b>90.03<b> | <b>2.46<b> |
<a name="4.2"></a>
### 4.2 Object Detection
For object detection, we adopt Baidus self-developed PicoDet, which focuses on lightweight object detection scenarios. The following table shows the comparison between the results of PP-LCNet and MobileNetV3 on the COCO dataset. PP-LCNet has an obvious advantage in both accuracy and speed.
| Backbone | mAP(%) | Latency(ms) |
|-------|-----------|----------|
MobileNetV3-large-0.35x | 19.2 | 8.1 |
<b>PP-LCNet-0.5x<b> | <b>20.3<b> | <b>6.0<b> |
MobileNetV3-large-0.75x | 25.8 | 11.1 |
<b>PP-LCNet-1x<b> | <b>26.9<b> | <b>7.9<b> |
<a name="4.3"></a>
### 4.3 Semantic Segmentation
For semantic segmentation, DeeplabV3+ is adopted. The following table presents the comparison between PP-LCNet and MobileNetV3 on the Cityscapes dataset, and PP-LCNet also stands out in terms of accuracy and speed.
| Backbone | mIoU(%) | Latency(ms) |
|-------|-----------|----------|
MobileNetV3-large-0.5x | 55.42 | 135 |
<b>PP-LCNet-0.5x<b> | <b>58.36<b> | <b>82<b> |
MobileNetV3-large-0.75x | 64.53 | 151 |
<b>PP-LCNet-1x<b> | <b>66.03<b> | <b>96<b> |
<a name="5"></a>
## 5. Conclusion
Rather than holding on to perfect FLOPs and Params as academics do, PP-LCNet focuses on analyzing how to add Intel CPU-friendly modules to improve the performance of the model, which can better balance accuracy and inference time. The experimental conclusions therein are available to other researchers in network structure design, while providing NAS search researchers with a smaller search space and general conclusions. The finished PP-LCNet can also be better accepted and applied in industry.
<a name="6"></a>
## 6. Reference
Reference to cite when you use PP-LCNet in a paper:
```
@misc{cui2021pplcnet,
title={PP-LCNet: A Lightweight CPU Convolutional Neural Network},
author={Cheng Cui and Tingquan Gao and Shengyu Wei and Yuning Du and Ruoyu Guo and Shuilong Dong and Bin Lu and Ying Zhou and Xueying Lv and Qiwen Liu and Xiaoguang Hu and Dianhai Yu and Yanjun Ma},
year={2021},
eprint={2109.15099},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```

@ -0,0 +1,26 @@
# PVTV2
---
## Content
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
PVTV2 is VisionTransformer series model, which build on PVT (Pyramid Vision Transformer). PVT use Transformer block to build feature pyramid network. The mainly designs of PVTV2 are: (1) overlapping patch embedding, (2) convolutional feedforward networks, and (3) linear complexity attention layers. [Paper](https://arxiv.org/pdf/2106.13797.pdf).
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| PVT_V2_B0 | 0.705 | 0.902 | 0.705 | - | 0.53 | 3.7 |
| PVT_V2_B1 | 0.787 | 0.945 | 0.787 | - | 2.0 | 14.0 |
| PVT_V2_B2 | 0.821 | 0.960 | 0.820 | - | 3.9 | 25.4 |
| PVT_V2_B3 | 0.831 | 0.965 | 0.831 | - | 6.7 | 45.2 |
| PVT_V2_B4 | 0.836 | 0.967 | 0.836 | - | 9.8 | 62.6 |
| PVT_V2_B5 | 0.837 | 0.966 | 0.838 | - | 11.4 | 82.0 |
| PVT_V2_B2_Linear | 0.821 | 0.961 | 0.821 | - | 3.8 | 22.6 |

@ -0,0 +1,24 @@
# ReXNet series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## Overview
ReXNet is proposed by NAVER AI Lab, which is based on new network design principles. Aiming at the problem of representative bottleneck in the existing network, a set of design principles are proposed. The author believes that the conventional design produce representational bottlenecks, which would affect model performance. To investigate the representational bottleneck, the author study the matrix rank of the features generated by ten thousand random networks. Besides, entire layers channel configuration is also studied to design more accurate network architectures. In the end, the author proposes a set of simple and effective design principles to mitigate the representational bottleneck. [paper](https://arxiv.org/pdf/2007.00992.pdf)
<a name='2'></a>
## Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | FLOPs<br/>(G) | Params<br/>(M) |
| :--------: | :---: | :---: | :---------------: | :-----------: | -------------- |
| ReXNet_1_0 | 77.46 | 93.70 | 77.9 | 0.415 | 4.838 |
| ReXNet_1_3 | 79.13 | 94.64 | 79.5 | 0.683 | 7.611 |
| ReXNet_1_5 | 80.06 | 95.12 | 80.3 | 0.900 | 9.791 |
| ReXNet_2_0 | 81.22 | 95.36 | 81.6 | 1.561 | 16.449 |
| ReXNet_3_0 | 82.09 | 96.12 | 82.8 | 3.445 | 34.833 |
Inference speed and other information are coming soon.

@ -0,0 +1,22 @@
# RedNet series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
In the backbone of ResNet and in all bottleneck positions of backbone, the convolution is replaced by Involution, but all convolutions are reserved for channel mapping and fusion. These carefully redesigned entities combine to form a new efficient backbone network, called Rednet. [paper](https://arxiv.org/abs/2103.06255).
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
| RedNet26 | 9.2 | 1.7 | 75.95 | 93.19 |
| RedNet38 | 12.4 | 2.2 | 77.47 | 93.56 |
| RedNet50 | 15.5 | 2.7 | 78.33 | 94.17 |
| RedNet101 | 25.7 | 4.7 | 78.94 | 94.36 |
| RedNet152 | 34.0 | 6.8 | 79.17 | 94.40 |

@ -0,0 +1,29 @@
# RepVGG series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
RepVGG (Making VGG-style ConvNets Great Again) series model is a simple but powerful convolutional neural network architecture proposed by Tsinghua University (Guiguang Ding's team), MEGVII Technology (Jian Sun et al.), HKUST and Aberystwyth University in 2021. The architecture has an inference time agent similar to VGG. The main body is composed of 3x3 convolution and relu stack, while the training time model has multi branch topology. The decoupling of training time and inference time is realized by re-parameterization technology, so the model is called repvgg. [paper](https://arxiv.org/abs/2101.03697).
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1| FLOPs<br>(G) |
|:--:|:--:|:--:|:--:|:--:|
| RepVGG_A0 | 0.7131 | 0.9016 | 0.7241 | |
| RepVGG_A1 | 0.7380 | 0.9146 | 0.7446 | |
| RepVGG_A2 | 0.7571 | 0.9264 | 0.7648 | |
| RepVGG_B0 | 0.7450 | 0.9213 | 0.7514 | |
| RepVGG_B1 | 0.7773 | 0.9385 | 0.7837 | |
| RepVGG_B2 | 0.7813 | 0.9410 | 0.7878 | |
| RepVGG_B1g2 | 0.7732 | 0.9359 | 0.7778 | |
| RepVGG_B1g4 | 0.7675 | 0.9335 | 0.7758 | |
| RepVGG_B2g4 | 0.7881 | 0.9448 | 0.7938 | |
| RepVGG_B3g4 | 0.7965 | 0.9485 | 0.8021 | |
Params, FLOPs, Inference speed and other information are coming soon.

@ -0,0 +1,32 @@
# ResNeSt and RegNet series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on T4 GPU](#3)
<a name='1'></a>
## 1. Overview
The ResNeSt series was proposed in 2020. The original resnet network structure has been improved by introducing K groups and adding an attention module similar to SEBlock in different groups, the accuracy is greater than that of the basic model ResNet, but the parameter amount and flops are almost the same as the basic ResNet.
RegNet was proposed in 2020 by Facebook to deepen the concept of design space. Based on AnyNetX, the model performance is gradually improved by shared bottleneck ratio, shared group width, adjusting network depth or width and other strategies. What's more, the design space structure is simplified, whose interpretability is also be improved. The quality of design space is improved while its diversity is maintained. Under similar conditions, the performance of the designed RegNet model performs better than EfficientNet and 5 times faster than EfficientNet.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| ResNeSt50_fast_1s1x64d | 0.8035 | 0.9528| 0.8035 | -| 8.68 | 26.3 |
| ResNeSt50 | 0.8083 | 0.9542| 0.8113 | -| 10.78 | 27.5 |
| RegNetX_4GF | 0.7850 | 0.9416| 0.7860 | -| 8.0 | 22.1 |
<a name='3'></a>
## 3. Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| ResNeSt50_fast_1s1x64d | 224 | 256 | 3.46466 | 5.56647 | 9.11848 | 3.45405 | 8.72680 | 15.48710 |
| ResNeSt50 | 224 | 256 | 7.05851 | 8.97676 | 13.34704 | 6.16248 | 12.0633 | 21.49936 |
| RegNetX_4GF | 224 | 256 | 6.69042 | 8.01664 | 11.60608 | 6.46478 | 11.19862 | 16.89089 |

@ -0,0 +1,104 @@
# ResNet and ResNet_vd series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on V100 GPU](#3)
* [4. Inference speed based on T4 GPU](#4)
<a name='1'></a>
## 1. Overview
The ResNet series model was proposed in 2015 and won the championship in the ILSVRC2015 competition with a top5 error rate of 3.57%. The network innovatively proposed the residual structure, and built the ResNet network by stacking multiple residual structures. Experiments show that using residual blocks can improve the convergence speed and accuracy effectively.
Joyce Xu of Stanford university calls ResNet one of three architectures that "really redefine the way we think about neural networks." Due to the outstanding performance of ResNet, more and more scholars and engineers from academia and industry have improved its structure. The well-known ones include wide-resnet, resnet-vc, resnet-vd, Res2Net, etc. The number of parameters and FLOPs of resnet-vc and resnet-vd are almost the same as those of ResNet, so we hereby unified them into the ResNet series.
The models of the ResNet series released this time include 14 pre-trained models including ResNet50, ResNet50_vd, ResNet50_vd_ssld, and ResNet200_vd. At the training level, ResNet adopted the standard training process for training ImageNet, while the rest of the improved model adopted more training strategies, such as cosine decay for the decline of learning rate and the regular label smoothing method,mixup was added to the data preprocessing, and the total number of iterations increased from 120 epoches to 200 epoches.
Among them, ResNet50_vd_v2 and ResNet50_vd_ssld adopted knowledge distillation, which further improved the accuracy of the model while keeping the structure unchanged. Specifically, the teacher model of ResNet50_vd_v2 is ResNet152_vd (top1 accuracy 80.59%), the training set is imagenet-1k, the teacher model of ResNet50_vd_ssld is ResNeXt101_32x16d_wsl (top1 accuracy 84.2%), and the training set is the combination of 4 million data mined by imagenet-22k and ImageNet-1k . The specific methods of knowledge distillation are being continuously updated.
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.ResNet.png)
As can be seen from the above curves, the higher the number of layers, the higher the accuracy, but the corresponding number of parameters, calculation and latency will increase. ResNet50_vd_ssld further improves the accuracy of top-1 of the ImageNet-1k validation set by using stronger teachers and more data, reaching 82.39%, refreshing the accuracy of ResNet50 series models.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| ResNet18 | 0.710 | 0.899 | 0.696 | 0.891 | 3.660 | 11.690 |
| ResNet18_vd | 0.723 | 0.908 | | | 4.140 | 11.710 |
| ResNet34 | 0.746 | 0.921 | 0.732 | 0.913 | 7.360 | 21.800 |
| ResNet34_vd | 0.760 | 0.930 | | | 7.390 | 21.820 |
| ResNet34_vd_ssld | 0.797 | 0.949 | | | 7.390 | 21.820 |
| ResNet50 | 0.765 | 0.930 | 0.760 | 0.930 | 8.190 | 25.560 |
| ResNet50_vc | 0.784 | 0.940 | | | 8.670 | 25.580 |
| ResNet50_vd | 0.791 | 0.944 | 0.792 | 0.946 | 8.670 | 25.580 |
| ResNet50_vd_v2 | 0.798 | 0.949 | | | 8.670 | 25.580 |
| ResNet101 | 0.776 | 0.936 | 0.776 | 0.938 | 15.520 | 44.550 |
| ResNet101_vd | 0.802 | 0.950 | | | 16.100 | 44.570 |
| ResNet152 | 0.783 | 0.940 | 0.778 | 0.938 | 23.050 | 60.190 |
| ResNet152_vd | 0.806 | 0.953 | | | 23.530 | 60.210 |
| ResNet200_vd | 0.809 | 0.953 | | | 30.530 | 74.740 |
| ResNet50_vd_ssld | 0.824 | 0.961 | | | 8.670 | 25.580 |
| ResNet50_vd_ssld_v2 | 0.830 | 0.964 | | | 8.670 | 25.580 |
| Fix_ResNet50_vd_ssld_v2 | 0.840 | 0.970 | | | 17.696 | 25.580 |
| ResNet101_vd_ssld | 0.837 | 0.967 | | | 16.100 | 44.570 |
* Note: `ResNet50_vd_ssld_v2` is obtained by adding AutoAugment in training process on the basis of `ResNet50_vd_ssld` training strategy.`Fix_ResNet50_vd_ssld_v2` stopped all parameter updates of `ResNet50_vd_ssld_v2` except the FC layer,and fine-tuned on ImageNet1k dataset, the resolution is 320x320.
<a name='3'></a>
## 3. Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|------------------|-----------|-------------------|--------------------------|
| ResNet18 | 224 | 256 | 1.499 |
| ResNet18_vd | 224 | 256 | 1.603 |
| ResNet34 | 224 | 256 | 2.272 |
| ResNet34_vd | 224 | 256 | 2.343 |
| ResNet34_vd_ssld | 224 | 256 | 2.343 |
| ResNet50 | 224 | 256 | 2.939 |
| ResNet50_vc | 224 | 256 | 3.041 |
| ResNet50_vd | 224 | 256 | 3.165 |
| ResNet50_vd_v2 | 224 | 256 | 3.165 |
| ResNet101 | 224 | 256 | 5.314 |
| ResNet101_vd | 224 | 256 | 5.252 |
| ResNet152 | 224 | 256 | 7.205 |
| ResNet152_vd | 224 | 256 | 7.200 |
| ResNet200_vd | 224 | 256 | 8.885 |
| ResNet50_vd_ssld | 224 | 256 | 3.165 |
| ResNet101_vd_ssld | 224 | 256 | 5.252 |
<a name='4'></a>
## 4. Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| ResNet18 | 224 | 256 | 1.3568 | 2.5225 | 3.61904 | 1.45606 | 3.56305 | 6.28798 |
| ResNet18_vd | 224 | 256 | 1.39593 | 2.69063 | 3.88267 | 1.54557 | 3.85363 | 6.88121 |
| ResNet34 | 224 | 256 | 2.23092 | 4.10205 | 5.54904 | 2.34957 | 5.89821 | 10.73451 |
| ResNet34_vd | 224 | 256 | 2.23992 | 4.22246 | 5.79534 | 2.43427 | 6.22257 | 11.44906 |
| ResNet34_vd | 224 | 256 | 2.23992 | 4.22246 | 5.79534 | 2.43427 | 6.22257 | 11.44906 |
| ResNet50 | 224 | 256 | 2.63824 | 4.63802 | 7.02444 | 3.47712 | 7.84421 | 13.90633 |
| ResNet50_vc | 224 | 256 | 2.67064 | 4.72372 | 7.17204 | 3.52346 | 8.10725 | 14.45577 |
| ResNet50_vd | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| ResNet50_vd_v2 | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| ResNet101 | 224 | 256 | 5.04037 | 7.73673 | 10.8936 | 6.07125 | 13.40573 | 24.3597 |
| ResNet101_vd | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 |
| ResNet152 | 224 | 256 | 7.28665 | 10.62001 | 14.90317 | 8.50198 | 19.17073 | 35.78384 |
| ResNet152_vd | 224 | 256 | 7.29127 | 10.86137 | 15.32444 | 8.54376 | 19.52157 | 36.64445 |
| ResNet200_vd | 224 | 256 | 9.36026 | 13.5474 | 19.0725 | 10.80619 | 25.01731 | 48.81399 |
| ResNet50_vd_ssld | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| ResNet50_vd_ssld_v2 | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
| Fix_ResNet50_vd_ssld_v2 | 320 | 320 | 3.42818 | 7.51534 | 13.19370 | 5.07696 | 14.64218 | 27.01453 |
| ResNet101_vd_ssld | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 |

@ -0,0 +1,126 @@
# SEResNeXt and Res2Net series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on V100 GPU](#3)
* [4. Inference speed based on T4 GPU](#4)
<a name='1'></a>
## 1. Overview
ResNeXt, one of the typical variants of ResNet, was presented at the CVPR conference in 2017. Prior to this, the methods to improve the model accuracy mainly focused on deepening or widening the network, which increased the number of parameters and calculation, and slowed down the inference speed accordingly. The concept of cardinality was proposed in ResNeXt structure. The author found that increasing the number of channel groups was more effective than increasing the depth and width through experiments. It can improve the accuracy without increasing the parameter complexity and reduce the number of parameters at the same time, so it is a more successful variant of ResNet.
SENet is the winner of the 2017 ImageNet classification competition. It proposes a new SE structure that can be migrated to any other network. It controls the scale to enhance the important features between each channel, and weaken the unimportant features. So that the extracted features are more directional.
Res2Net is a brand-new improvement of ResNet proposed in 2019. The solution can be easily integrated with other excellent modules. Without increasing the amount of calculation, the performance on ImageNet, CIFAR-100 and other data sets exceeds ResNet. Res2Net, with its simple structure and superior performance, further explores the multi-scale representation capability of CNN at a more fine-grained level. Res2Net reveals a new dimension to improve model accuracy, called scale, which is an essential and more effective factor in addition to the existing dimensions of depth, width, and cardinality. The network also performs well in other visual tasks such as object detection and image segmentation.
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png)
![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png)
![](../../images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png)
At present, there are a total of 24 pretrained models of the three categories open sourced by PaddleClas, and the indicators are shown in the figure. It can be seen from the diagram that under the same Flops and Params, the improved model tends to have higher accuracy, but the inference speed is often inferior to the ResNet series. On the other hand, Res2Net performed better. Compared with group operation in ResNeXt and SE structure operation in SEResNet, Res2Net tended to have better accuracy in the same Flops, Params and inference speed.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| Res2Net50_26w_4s | 0.793 | 0.946 | 0.780 | 0.936 | 8.520 | 25.700 |
| Res2Net50_vd_26w_4s | 0.798 | 0.949 | | | 8.370 | 25.060 |
| Res2Net50_vd_26w_4s_ssld | 0.831 | 0.966 | | | 8.370 | 25.060 |
| Res2Net50_14w_8s | 0.795 | 0.947 | 0.781 | 0.939 | 9.010 | 25.720 |
| Res2Net101_vd_26w_4s | 0.806 | 0.952 | | | 16.670 | 45.220 |
| Res2Net101_vd_26w_4s_ssld | 0.839 | 0.971 | | | 16.670 | 45.220 |
| Res2Net200_vd_26w_4s | 0.812 | 0.957 | | | 31.490 | 76.210 |
| Res2Net200_vd_26w_4s_ssld | **0.851** | 0.974 | | | 31.490 | 76.210 |
| ResNeXt50_32x4d | 0.778 | 0.938 | 0.778 | | 8.020 | 23.640 |
| ResNeXt50_vd_32x4d | 0.796 | 0.946 | | | 8.500 | 23.660 |
| ResNeXt50_64x4d | 0.784 | 0.941 | | | 15.060 | 42.360 |
| ResNeXt50_vd_64x4d | 0.801 | 0.949 | | | 15.540 | 42.380 |
| ResNeXt101_32x4d | 0.787 | 0.942 | 0.788 | | 15.010 | 41.540 |
| ResNeXt101_vd_32x4d | 0.803 | 0.951 | | | 15.490 | 41.560 |
| ResNeXt101_64x4d | 0.784 | 0.945 | 0.796 | | 29.050 | 78.120 |
| ResNeXt101_vd_64x4d | 0.808 | 0.952 | | | 29.530 | 78.140 |
| ResNeXt152_32x4d | 0.790 | 0.943 | | | 22.010 | 56.280 |
| ResNeXt152_vd_32x4d | 0.807 | 0.952 | | | 22.490 | 56.300 |
| ResNeXt152_64x4d | 0.795 | 0.947 | | | 43.030 | 107.570 |
| ResNeXt152_vd_64x4d | 0.811 | 0.953 | | | 43.520 | 107.590 |
| SE_ResNet18_vd | 0.733 | 0.914 | | | 4.140 | 11.800 |
| SE_ResNet34_vd | 0.765 | 0.932 | | | 7.840 | 21.980 |
| SE_ResNet50_vd | 0.795 | 0.948 | | | 8.670 | 28.090 |
| SE_ResNeXt50_32x4d | 0.784 | 0.940 | 0.789 | 0.945 | 8.020 | 26.160 |
| SE_ResNeXt50_vd_32x4d | 0.802 | 0.949 | | | 10.760 | 26.280 |
| SE_ResNeXt101_32x4d | 0.7939 | 0.9443 | 0.793 | 0.950 | 15.020 | 46.280 |
| SENet154_vd | 0.814 | 0.955 | | | 45.830 | 114.290 |
<a name='3'></a>
## 3. Inference speed based on V100 GPU
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-----------------------|-----------|-------------------|--------------------------|
| Res2Net50_26w_4s | 224 | 256 | 4.148 |
| Res2Net50_vd_26w_4s | 224 | 256 | 4.172 |
| Res2Net50_14w_8s | 224 | 256 | 5.113 |
| Res2Net101_vd_26w_4s | 224 | 256 | 7.327 |
| Res2Net200_vd_26w_4s | 224 | 256 | 12.806 |
| ResNeXt50_32x4d | 224 | 256 | 10.964 |
| ResNeXt50_vd_32x4d | 224 | 256 | 7.566 |
| ResNeXt50_64x4d | 224 | 256 | 13.905 |
| ResNeXt50_vd_64x4d | 224 | 256 | 14.321 |
| ResNeXt101_32x4d | 224 | 256 | 14.915 |
| ResNeXt101_vd_32x4d | 224 | 256 | 14.885 |
| ResNeXt101_64x4d | 224 | 256 | 28.716 |
| ResNeXt101_vd_64x4d | 224 | 256 | 28.398 |
| ResNeXt152_32x4d | 224 | 256 | 22.996 |
| ResNeXt152_vd_32x4d | 224 | 256 | 22.729 |
| ResNeXt152_64x4d | 224 | 256 | 46.705 |
| ResNeXt152_vd_64x4d | 224 | 256 | 46.395 |
| SE_ResNet18_vd | 224 | 256 | 1.694 |
| SE_ResNet34_vd | 224 | 256 | 2.786 |
| SE_ResNet50_vd | 224 | 256 | 3.749 |
| SE_ResNeXt50_32x4d | 224 | 256 | 8.924 |
| SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.011 |
| SE_ResNeXt101_32x4d | 224 | 256 | 19.204 |
| SENet154_vd | 224 | 256 | 50.406 |
<a name='4'></a>
## 4. Inference speed based on T4 GPU
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
| Res2Net50_26w_4s | 224 | 256 | 3.56067 | 6.61827 | 11.41566 | 4.47188 | 9.65722 | 17.54535 |
| Res2Net50_vd_26w_4s | 224 | 256 | 3.69221 | 6.94419 | 11.92441 | 4.52712 | 9.93247 | 18.16928 |
| Res2Net50_14w_8s | 224 | 256 | 4.45745 | 7.69847 | 12.30935 | 5.4026 | 10.60273 | 18.01234 |
| Res2Net101_vd_26w_4s | 224 | 256 | 6.53122 | 10.81895 | 18.94395 | 8.08729 | 17.31208 | 31.95762 |
| Res2Net200_vd_26w_4s | 224 | 256 | 11.66671 | 18.93953 | 33.19188 | 14.67806 | 32.35032 | 63.65899 |
| ResNeXt50_32x4d | 224 | 256 | 7.61087 | 8.88918 | 12.99674 | 7.56327 | 10.6134 | 18.46915 |
| ResNeXt50_vd_32x4d | 224 | 256 | 7.69065 | 8.94014 | 13.4088 | 7.62044 | 11.03385 | 19.15339 |
| ResNeXt50_64x4d | 224 | 256 | 13.78688 | 15.84655 | 21.79537 | 13.80962 | 18.4712 | 33.49843 |
| ResNeXt50_vd_64x4d | 224 | 256 | 13.79538 | 15.22201 | 22.27045 | 13.94449 | 18.88759 | 34.28889 |
| ResNeXt101_32x4d | 224 | 256 | 16.59777 | 17.93153 | 21.36541 | 16.21503 | 19.96568 | 33.76831 |
| ResNeXt101_vd_32x4d | 224 | 256 | 16.36909 | 17.45681 | 22.10216 | 16.28103 | 20.25611 | 34.37152 |
| ResNeXt101_64x4d | 224 | 256 | 30.12355 | 32.46823 | 38.41901 | 30.4788 | 36.29801 | 68.85559 |
| ResNeXt101_vd_64x4d | 224 | 256 | 30.34022 | 32.27869 | 38.72523 | 30.40456 | 36.77324 | 69.66021 |
| ResNeXt152_32x4d | 224 | 256 | 25.26417 | 26.57001 | 30.67834 | 24.86299 | 29.36764 | 52.09426 |
| ResNeXt152_vd_32x4d | 224 | 256 | 25.11196 | 26.70515 | 31.72636 | 25.03258 | 30.08987 | 52.64429 |
| ResNeXt152_64x4d | 224 | 256 | 46.58293 | 48.34563 | 56.97961 | 46.7564 | 56.34108 | 106.11736 |
| ResNeXt152_vd_64x4d | 224 | 256 | 47.68447 | 48.91406 | 57.29329 | 47.18638 | 57.16257 | 107.26288 |
| SE_ResNet18_vd | 224 | 256 | 1.61823 | 3.1391 | 4.60282 | 1.7691 | 4.19877 | 7.5331 |
| SE_ResNet34_vd | 224 | 256 | 2.67518 | 5.04694 | 7.18946 | 2.88559 | 7.03291 | 12.73502 |
| SE_ResNet50_vd | 224 | 256 | 3.65394 | 7.568 | 12.52793 | 4.28393 | 10.38846 | 18.33154 |
| SE_ResNeXt50_32x4d | 224 | 256 | 9.06957 | 11.37898 | 18.86282 | 8.74121 | 13.563 | 23.01954 |
| SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.25016 | 11.85045 | 25.57004 | 9.17134 | 14.76192 | 19.914 |
| SE_ResNeXt101_32x4d | 224 | 256 | 19.34455 | 20.6104 | 32.20432 | 18.82604 | 25.31814 | 41.97758 |
| SENet154_vd | 224 | 256 | 49.85733 | 54.37267 | 74.70447 | 53.79794 | 66.31684 | 121.59885 |

@ -0,0 +1,28 @@
# SwinTransformer
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
Swin Transformer a new vision Transformer, that capably serves as a general-purpose backbone for computer vision. It is a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. [Paper](https://arxiv.org/abs/2103.14030)。
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | 0.812 | 0.955 | 4.5 | 28 |
| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | 0.832 | 0.962 | 8.7 | 50 |
| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | 0.835 | 0.965 | 15.4 | 88 |
| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | 0.845 | 0.970 | 47.1 | 88 |
| SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8487 | 0.9746 | 0.852 | 0.975 | 15.4 | 88 |
| SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8642 | 0.9807 | 0.864 | 0.980 | 47.1 | 88 |
| SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8596 | 0.9783 | 0.863 | 0.979 | 34.5 | 197 |
| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8719 | 0.9823 | 0.873 | 0.982 | 103.9 | 197 |
[1]: Based on imagenet22k dataset pre-training, and then in imagenet1k dataset transfer learning.
**Note**: The difference of precision with reference from the difference of data preprocessing.

@ -0,0 +1,19 @@
# TNT series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
TNT(Transformer-iN-Transformer) series models were proposed by Huawei-Noah in 2021 for modeling both patch-level and pixel-level representation. In each TNT block, an outer transformer block is utilized to process patch embeddings, and an inner transformer block extracts local features from pixel embeddings. The pixel-level feature is projected to the space of patch embedding by a linear transformation layer and then added into the patch. By stacking the TNT blocks, we build the TNT model for image recognition. Experiments on ImageNet benchmark and downstream tasks demonstrate the superiority and efficiency of the proposed TNT architecture. For example, our TNT achieves 81.3% top-1 accuracy on ImageNet which is 1.5% higher than that of DeiT with similar computational cost. [Paper](https://arxiv.org/abs/2103.00112).
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
| TNT_small | 23.8 | 5.2 | 81.12 | 95.56 |

@ -0,0 +1,24 @@
# Twins
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
The Twins network includes Twins-PCPVT and Twins-SVT, which focuses on the meticulous design of the spatial attention mechanism, resulting in a simple but more effective solution. Since the architecture only involves matrix multiplication, and the current deep learning framework has a high degree of optimization for matrix multiplication, the architecture is very efficient and easy to implement. Moreover, this architecture can achieve excellent performance in a variety of downstream vision tasks such as image classification, target detection, and semantic segmentation. [Paper](https://arxiv.org/abs/2104.13840).
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| pcpvt_small | 0.8082 | 0.9552 | 0.812 | - | 3.7 | 24.1 |
| pcpvt_base | 0.8242 | 0.9619 | 0.827 | - | 6.4 | 43.8 |
| pcpvt_large | 0.8273 | 0.9650 | 0.831 | - | 9.5 | 60.9 |
| alt_gvt_small | 0.8140 | 0.9546 | 0.817 | - | 2.8 | 24 |
| alt_gvt_base | 0.8294 | 0.9621 | 0.832 | - | 8.3 | 56 |
| alt_gvt_large | 0.8331 | 0.9642 | 0.837 | - | 14.8 | 99.2 |
**Note**:The difference in accuracy from Reference is due to the difference in data preprocessing.

@ -0,0 +1,41 @@
# ViT and DeiT series
---
## Catalogue
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
<a name='1'></a>
## 1. Overview
ViT(Vision Transformer) series models were proposed by Google in 2020. These models only use the standard transformer structure, completely abandon the convolution structure, splits the image into multiple patches and then inputs them into the transformer, showing the potential of transformer in the CV field.。[Paper](https://arxiv.org/abs/2010.11929)。
DeiT(Data-efficient Image Transformers) series models were proposed by Facebook at the end of 2020. Aiming at the problem that the ViT models need large-scale dataset training, the DeiT improved them, and finally achieved 83.1% Top1 accuracy on ImageNet. More importantly, using convolution model as teacher model, and performing knowledge distillation on these models, the Top1 accuracy of 85.2% can be achieved on the ImageNet dataset.
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| ViT_small_patch16_224 | 0.7769 | 0.9342 | 0.7785 | 0.9342 | | |
| ViT_base_patch16_224 | 0.8195 | 0.9617 | 0.8178 | 0.9613 | | |
| ViT_base_patch16_384 | 0.8414 | 0.9717 | 0.8420 | 0.9722 | | |
| ViT_base_patch32_384 | 0.8176 | 0.9613 | 0.8166 | 0.9613 | | |
| ViT_large_patch16_224 | 0.8323 | 0.9650 | 0.8306 | 0.9644 | | |
| ViT_large_patch16_384 | 0.8513 | 0.9736 | 0.8517 | 0.9736 | | |
| ViT_large_patch32_384 | 0.8153 | 0.9608 | 0.815 | - | | |
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| DeiT_tiny_patch16_224 | 0.718 | 0.910 | 0.722 | 0.911 | | |
| DeiT_small_patch16_224 | 0.796 | 0.949 | 0.799 | 0.950 | | |
| DeiT_base_patch16_224 | 0.817 | 0.957 | 0.818 | 0.956 | | |
| DeiT_base_patch16_384 | 0.830 | 0.962 | 0.829 | 0.972 | | |
| DeiT_tiny_distilled_patch16_224 | 0.741 | 0.918 | 0.745 | 0.919 | | |
| DeiT_small_distilled_patch16_224 | 0.809 | 0.953 | 0.812 | 0.954 | | |
| DeiT_base_distilled_patch16_224 | 0.831 | 0.964 | 0.834 | 0.965 | | |
| DeiT_base_distilled_patch16_384 | 0.851 | 0.973 | 0.852 | 0.972 | | |
Params, FLOPs, Inference speed and other information are coming soon.

@ -0,0 +1,30 @@
models
================================
.. toctree::
:maxdepth: 2
DPN_DenseNet_en.md
models_intro_en.md
RepVGG_en.md
EfficientNet_and_ResNeXt101_wsl_en.md
ViT_and_DeiT_en.md
SwinTransformer_en.md
Others_en.md
SEResNext_and_Res2Net_en.md
ESNet_en.md
HRNet_en.md
ReXNet_en.md
Inception_en.md
TNT_en.md
RedNet_en.md
DLA_en.md
ResNeSt_RegNet_en.md
PP-LCNet_en.md
HarDNet_en.md
ResNet_and_vd_en.md
LeViT_en.md
Mobile_en.md
MixNet_en.md
Twins_en.md
PVTV2_en.md

@ -0,0 +1,360 @@
# Image Classification
------
Image Classification is a fundamental task that classifies the image by semantic information and assigns it to a specific label. Image Classification is the foundation of Computer Vision tasks, such as object detection, image segmentation, object tracking and behavior analysis. Image Classification has comprehensive applications, including face recognition and smart video analysis in the security and protection field, traffic scenario recognition in the traffic field, image retrieval and electronic photo album classification in the internet industry, and image recognition in the medical industry.
Generally speaking, Image Classification attempts to comprehend an entire image as a whole by feature engineering and assigns labels by a classifier. Hence, how to extract the features of image is the essential part. Before deep learning, the most used classification method is the Bag of Words model. However, Image Classification based on deep learning can learn the hierarchical feature description by supervised and unsupervised learning, replacing the manually image feature selection. Recently, Convolution Neural Network in deep learning has an awesome performance in the image field. CNN uses the pixel information as the input to get the all information of images. Additionally, since the model uses convolution to extract features, and the output is classification result. Thus, this kind of end-to-end method achieves ideal performance and is applied widely.
Image Classification is a very basic but important field in the subject of computer vision. Its research results have always influenced the development of computer vision and even deep learning. Image classification has many sub-fields, such as multi-label image classification and fine-grained image classification. Here is only a brief description of single-label image classification.
See [here](../algorithm_introduction/image_classification_en.md) for the detailed introduction of image classification algorithms.
## Catalogue
- [1. Dataset Introduction](#1)
- [1.1 ImageNet-1k](#1.1)
- [1.2 CIFAR-10/CIFAR-100](#1.2)
- [2. Image Classification Process](#2)
- [2.1 Data and Its Preprocessing](#2.1)
- [2.2 Prepare the Model](#2.2)
- [2.3 Train the Model](#2.3)
- [2.4 Evaluate the Model](#2.4)
- [3. Application Methods](#3)
- [3.1 Training and Evaluation on CPU or Single GPU](#3.1)
- [3.1.1 Model Training](#3.1.1)
- [3.1.2 Model Finetuning](#3.1.2)
- [3.1.3 Resume Training](#3.1.3)
- [3.1.4 Model Evaluation](#3.1.4)
- [3.2 Training and Evaluation on Linux+ Multi-GPU](#3.2)
- [3.2.1 Model Training](#3.2.1)
- [3.2.2 Model Finetuning](#3.2.2)
- [3.2.3 Resume Training](#3.2.3)
- [3.2.4 Model Evaluation](#3.2.4)
- [3.3 Use the Pre-trained Model to Predict](#3.3)
- [3.4 Use the Inference Model to Predict](#3.4)
<a name="1"></a>
## 1. Dataset Introduction
<a name="1.1"></a>
### 1.1 ImageNet-1k
The ImageNet is a large-scale visual database for the research of visual object recognition. More than 14 million images have been annotated manually to point out objects in the picture in this project, and at least more than 1 million images provide bounding box. ImageNet-1k is a subset of the ImageNet dataset, which contains 1000 categories. The training set contains 1281167 image data, and the validation set contains 50,000 image data. Since 2010, the ImageNet project has held an image classification competition every year, which is the ImageNet Large-scale Visual Recognition Challenge (ILSVRC). The dataset used in the challenge is ImageNet-1k. So far, ImageNet-1k has become one of the most important data sets for the development of computer vision, and it promotes the development of the entire computer vision. The initialization models of many computer vision downstream tasks are based on the weights trained on this dataset.
<a name="1.2"></a>
### 1.2 CIFAR-10/CIFAR-100
The CIFAR-10 dataset consists of 60,000 color images in 10 categories, with an image resolution of 32x32, and each category has 6000 images, including 5000 in the training set and 1000 in the validation set. 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships and trucks. The CIFAR-100 data set is an extension of CIFAR-10. It consists of 60,000 color images in 100 classes, with an image resolution of 32x32, and each class has 600 images, including 500 in the training set and 100 in the validation set. Researchers can try different algorithms quickly because these two data sets are small in scale. These two datasets are also commonly used data sets for testing the quality of models in the image classification field.
<a name="2"></a>
## 2. Image Classification Process
The prepared training data is preprocessed and then passed through the image classification model. The output of the model and the real label are used in a cross-entropy loss function. This loss function describes the convergence direction of the model. Then the corresponding gradient descent for the final loss function is calculated and returned to the model, which update the weight of the model by optimizers. Finally, an image classification model can be obtained.
<a name="2.1"></a>
### 2.1 Data Preprocessing
The quality and quantity of data often determine the performance of a model. In the field of image classification, data includes images and labels. In most cases, labeled data is scarce, so the amount of data is difficult to reach the level of saturation of the model. In order to enable the model to learn more image features, a lot of image transformation or data augmentation is required before the image enters the model, so as to ensure the diversity of input image data and ensure that the model has better generalization capabilities. PaddleClas provides standard image transformation for training ImageNet-1k, and also provides 8 data augmentation methods. For related codes, please refer to [data preprocess](../../../ppcls/data/preprocess)The configuration file refer to [Data Augmentation Configuration File](../../../ppcls/configs/ImageNet/DataAugment). For related algorithms, please refer to [data augment algorithms](../algorithm_introduction/DataAugmentation_en.md).
<a name="2.2"></a>
### 2.2 Prepare the Model
After the data is determined, the model often determines the upper limit of the final accuracy. In the field of image classification, classic models emerge in an endless stream. PaddleClas provides 35 series and a total of 164 ImageNet pre-trained models. For specific accuracy, speed and other indicators, please refer to [Backbone Network Introduction](../algorithm_introduction/ImageNet_models_en.md).
<a name="2.3"></a>
### 2.3 Train
After preparing the data and model, you can start training the model and update the parameters of the model. After many iterations, a trained model can finally be obtained for image classification tasks. The training process of image classification requires a lot of experience and involves the setting of many hyperparameters. PaddleClas provides a series of [training tuning methods](./train_strategy_en.md), which can quickly help you obtain a high-precision model.
PaddleClas support training with VisualDL to visualize the metric. VisualDL is a visualization analysis tool of PaddlePaddle, provides a variety of charts to show the trends of parameters, and visualizes model structures, data samples, histograms of tensors, PR curves , ROC curves and high-dimensional data distributions. It enables users to understand the training process and the model structure more clearly and intuitively so as to optimize models efficiently. For more information, please refer to [VisualDL](../others/VisualDL_en.md).
<a name="2.4"></a>
### 2.4 Evaluation
After a model is trained, the evaluation results of the model on the validation set can determine the performance of the model. The evaluation index is generally Top1-Acc or Top5-Acc. The higher the index, the better the model performance.
<a name="3"></a>
## 3. Application Methods
Please refer to [Installation](../installation/install_paddleclas_en.md) to setup environment at first, and prepare flower102 dataset by following the instruction mentioned in the [Quick Start](../quick_start/quick_start_classification_new_user_en.md).
So far, PaddleClas supports the following training/evaluation environments:
```
└── CPU/Single GPU
├── Linux
└── Windows
└── Multi card GPU
└── Linux
```
<a name="3.1"></a>
### 3.1 Training and Evaluation on CPU or Single GPU
If training and evaluation are performed on CPU or single GPU, it is recommended to use the `tools/train.py` and `tools/eval.py`. For training and evaluation in multi-GPU environment on Linux, please refer to [3.2 Training and evaluation on Linux+GPU](#3.2).
<a name="3.1.1"></a>
#### 3.1.1 Model Training
After preparing the configuration file, The training process can be started in the following way.
```shell
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Arch.pretrained=False \
-o Global.device=gpu
```
Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o Arch.pretrained=False` means to not using pre-trained models. `-o Global.device=gpu` means to use GPU for training. If you want to use the CPU for training, you need to set `Global.device` to `cpu`.
Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config_description_en.md).
The output log examples are as follows:
- If mixup or cutmix is used in training, top-1 and top-k (default by 5) will not be printed in the log:
```
...
[Train][Epoch 3/20][Avg]CELoss: 6.46287, loss: 6.46287
...
[Eval][Epoch 3][Avg]CELoss: 5.94309, loss: 5.94309, top1: 0.01961, top5: 0.07941
...
```
- If mixup or cutmix is not used during training, in addition to the above information, top-1 and top-k (The default is 5) will also be printed in the log:
```
...
[Train][Epoch 3/20][Avg]CELoss: 6.12570, loss: 6.12570, top1: 0.01765, top5: 0.06961
...
[Eval][Epoch 3][Avg]CELoss: 5.40727, loss: 5.40727, top1: 0.07549, top5: 0.20980
...
```
During training, you can view loss changes in real time through `VisualDL`, see [VisualDL](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/en/extension/VisualDL_en.md) for details.
<a name="3.1.2"></a>
#### 3.1.2 Model Finetuning
After correcting config file, you can load pretrained model weight to finetune. The command is as follows:
```shell
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Arch.pretrained=True \
-o Global.device=gpu
```
Among them,`Arch.pretrained` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. You can also set it into `True` to use pretrained weights that trained in ImageNet1k.
We also provide a lot of pre-trained models trained on the ImageNet-1k dataset. For the model list and download address, please refer to the [model library overview](../algorithm_introduction/ImageNet_models_en.md).
<a name="3.1.3"></a>
#### 3.1.3 Resume Training
If the training process is terminated for some reasons, you can also load the checkpoints to continue training.
```shell
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
-o Global.device=gpu
```
The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.
**Note**:
- The `-o Global.checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `Global.checkpoints` to `../output/MobileNetV3_large_x1_0/epoch_5`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes. Files in the output directory are structured as follows
```
output
├── MobileNetV3_large_x1_0
│ ├── best_model.pdopt
│ ├── best_model.pdparams
│ ├── best_model.pdstates
│ ├── epoch_1.pdopt
│ ├── epoch_1.pdparams
│ ├── epoch_1.pdstates
.
.
.
```
<a name="3.1.4"></a>
#### 3.1.4 Model Evaluation
The model evaluation process can be started as follows.
```shell
python3 tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
```
The above command will use `./configs/quick_start/MobileNetV3_large_x1_0.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
Some of the configurable evaluation parameters are described as follows:
- `Arch.name`Model name
- `Global.pretrained_model`The path of the model file to be evaluated
**Note** When loading the model to be evaluated, you only need to specify the path of the model file stead of the suffix. PaddleClas will automatically add the `.pdparams` suffix, such as [3.1.3 Resume Training](#3.1.3).
When loading the model to be evaluated, you only need to specify the path of the model file stead of the suffix. PaddleClas will automatically add the `.pdparams` suffix, such as [3.1.3 Resume Training](https://github.com/PaddlePaddle/PaddleClas/blob/ develop/docs/zh_CN/models_training/classification.md#3.1.3).
<a name="3.2"></a>
### 3.2 Training and Evaluation on Linux+ Multi-GPU
If you want to run PaddleClas on Linux with GPU, it is highly recommended to use `paddle.distributed.launch` to start the model training script(`tools/train.py`) and evaluation script(`tools/eval.py`), which can start on multi-GPU environment more conveniently.
<a name="3.2.1"></a>
#### 3.2.1 Model Training
The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `gpus`:
```shell
# PaddleClas initiates multi-card multi-process training via launch
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml
```
The format of output log information is the same as above, see [3.1.1 Model training](#3.1.1) for details.
<a name="3.2.2"></a>
#### 3.2.2 Model Finetuning
After configuring the yaml file, you can finetune it by loading the pretrained weights. The command is as below.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Arch.pretrained=True
```
Among them, `Arch.pretrained` is set to `True` or `False`. It also can be used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
There contains a lot of examples of model finetuning in the [new user version](../quick_start/quick_start_classification_new_user_en.md) and [professional version](../quick_start/quick_start_classification_professional_en.md) of PaddleClas Trial in 30 mins. You can refer to this tutorial to finetune the model on a specific dataset.
<a name="3.2.3"></a>
#### 3.2.3 Resume Training
If the training process is terminated for some reasons, you can also load the checkpoints to continue training.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
-o Global.device=gpu
```
The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter as described in [3.1.3 Resume training](#3.1.3).
<a name="3.2.4"></a>
#### 3.2.4 Model Evaluation
The model evaluation process can be started as follows.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
```
About parameter description, see [3.1.4 Model evaluation](#3.1.4) for details.
<a name="3.3"></a>
### 3.3 Use the Pre-trained Model to Predict
After the training is completed, you can predict by using the pre-trained model obtained by the training. A complete example is provided in `tools/infer/infer.py` of the model library, run the following command to conduct model prediction:
```
python3 tools/infer.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg \
-o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
```
Parameters
- `Infer.infer_imgs`The path of the image file or folder to be predicted.
- `Global.pretrained_model`Weight file path, such as`./output/MobileNetV3_large_x1_0/best_model`
<a name="3.4"></a>
### 3.4 Use the Inference Model to Predict
By exporting the inference modelPaddlePaddle supports inference using prediction engines, which will be introduced next. Firstly, you should export inference model using `tools/export_model.py`.
```shell
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o Global.pretrained_model=output/MobileNetV3_large_x1_0/best_model
```
Among them, `Global.pretrained_model` parameter is used to specify the model file path that does not need to include the file suffix name.such as [3.1.3 Resume Training](#3.1.3))。
The above command will generate the model structure file (`inference.pdmodel`) and the model weight file (`inference.pdiparams`), and then the inference engine can be used for inference:
Go to the deploy directory:
```
cd deploy
```
Using inference engine to inference. Because the mapping file of ImageNet1k dataset is `class_id_map_file` by default, here it should be set to None.
```shell
python3 python/predict_cls.py \
-c configs/inference_cls.yaml \
-o Global.infer_imgs=../dataset/flowers102/jpg/image_00001.jpg \
-o Global.inference_model_dir=../inference/ \
-o PostProcess.Topk.class_id_map_file=None
```
Among them
- `Global.infer_imgs`The path of the image file to be predicted.
- `Global.inference_model_dir`Model structure file path, such as `../inference/`.
- `Global.use_tensorrt`Whether to use the TesorRT, default by `False`.
- `Global.use_gpu`Whether to use the GPU, default by `True`.
- `Global.enable_mkldnn`Wheter to use `MKL-DNN`, default by `False`. It is valid when `Global.use_gpu` is `False`.
- `Global.use_fp16`Whether to enable FP16, default by `False`.
Note: If you want to use `Transformer` series models, such as `DeiT_***_384`, `ViT_***_384`, etc.please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
If you want to evaluate the speed of the model, it is recommended to enable TensorRT to accelerate for GPU, and MKL-DNN for CPU.

@ -0,0 +1,270 @@
# Configuration Instruction
------
## Introdction
The parameters in the PaddleClas configuration file(`ppcls/configs/*.yaml`)are described for you to customize or modify the hyperparameter configuration more quickly.
## Details
### Catalogue
- [1. Classification model](#1)
- [1.1 Global Configuration](#1.1)
- [1.2 Architecture](#1.2)
- [1.3 Loss function](#1.3)
- [1.4 Optimizer](#1.4)
- [1.5 Data reading module(DataLoader)](#1.5)
- [1.5.1 dataset](#1.5.1)
- [1.5.2 sampler](#1.5.2)
- [1.5.3 loader](#1.5.3)
- [1.6 Evaluation metric](#1.6)
- [1.7 Inference](#1.7)
- [2. Distillation model](#2)
- [2.1 Architecture](#2.1)
- [2.2 Loss function](#2.2)
- [2.3 Evaluation metric](#2.3)
- [3. Recognition model](#3)
- [3.1 Architechture](#3.1)
- [3.2 Evaluation metric](#3.2)
<a name="1"></a>
### 1. Classification model
Here the configuration of `ResNet50_vd` on`ImageNet-1k`is used as an example to explain the each parameter in detail. [Configure Path](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml).
<a name="1.1"></a>
#### 1.1 Global Configuration
| Parameter name | Specific meaning | Defult value | Optional value |
| ------------------ | ------------------------------------------------------- | ---------------- | ----------------- |
| checkpoints | Breakpoint model path for resuming training | null | str |
| pretrained_model | Pre-trained model path | null | str |
| output_dir | Save model path | "./output/" | str |
| save_interval | How many epochs to save the model at each interval | 1 | int |
| eval_during_train | Whether to evaluate at training | True | bool |
| eval_interval | How many epochs to evaluate at each interval | 1 | int |
| epochs | Total number of epochs in training | | int |
| print_batch_step | How many mini-batches to print out at each interval | 10 | int |
| use_visualdl | Whether to visualize the training process with visualdl | False | bool |
| image_shape | Image size | [3224224] | list, shape: (3,) |
| save_inference_dir | Inference model save path | "./inference" | str |
| eval_mode | Model of eval | "classification" | "retrieval" |
**Note**The http address of pre-trained model can be filled in the `pretrained_model`
<a name="1.2"></a>
#### 1.2 Architecture
| Parameter name | Specific meaning | Defult value | Optional value |
| -------------- | ----------------- | ------------ | --------------------- |
| name | Model Arch name | ResNet50 | PaddleClas model arch |
| class_num | Category number | 1000 | int |
| pretrained | Pre-trained model | False | bool str |
**Note**: Here pretrained can be set to True or False, so does the path of the weights. In addition, the pretrained is disabled when Global.pretrained_model is also set to the corresponding path.
<a name="1.3"></a>
#### 1.3 Loss function
| Parameter name | Specific meaning | Defult value | Optional value |
| -------------- | ------------------------------------------- | ------------ | ---------------------- |
| CELoss | cross-entropy loss function | —— | —— |
| CELoss.weight | The weight of CELoss in the whole Loss | 1.0 | float |
| CELoss.epsilon | The epsilon value of label_smooth in CELoss | 0.1 | floatbetween 0 and 1 |
<a name="1.4"></a>
#### 1.4 Optimizer
| Parameter name | Specific meaning | Defult value | Optional value |
| ----------------- | -------------------------------- | ------------ | -------------------------------------------------- |
| name | optimizer method name | "Momentum" | Other optimizer including "RmsProp" |
| momentum | momentum value | 0.9 | float |
| lr.name | method of dropping learning rate | "Cosine" | Other dropping methods of "Linear" and "Piecewise" |
| lr.learning_rate | initial value of learning rate | 0.1 | float |
| lr.warmup_epoch | warmup rounds | 0 | intsuch as 5 |
| regularizer.name | regularization method name | "L2" | ["L1", "L2"] |
| regularizer.coeff | regularization factor | 0.00007 | float |
**Note**The new parameters may be different when `lr.name` is different , as when `lr.name=Piecewise`, the following parameters need to be added:
```
lr:
name: Piecewise
learning_rate: 0.1
decay_epochs: [30, 60, 90]
values: [0.1, 0.01, 0.001, 0.0001]
```
Referring to [learning_rate.py](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/optimizer/learning_rate.py) for adding method and parameters.
<a name="1.5"></a>
#### 1.5 Data reading module(DataLoader)
<a name="1.5.1"></a>
##### 1.5.1 dataset
| Parameter name | Specific meaning | Defult value | Optional value |
| ------------------- | ------------------------------------ | ----------------------------------- | ------------------------------ |
| name | The name of the class to read the data | ImageNetDataset | VeriWild and other Dataet type |
| image_root | The path where the dataset is stored | ./dataset/ILSVRC2012/ | str |
| cls_label_path | data label list | ./dataset/ILSVRC2012/train_list.txt | str |
| transform_ops | data preprocessing for single images | —— | —— |
| batch_transform_ops | Data preprocessing for batch images | —— | —— |
The parameter meaning of transform_ops:
| Function name | Parameter name | Specific meaning |
| -------------- | -------------- | --------------------- |
| DecodeImage | to_rgb | data to RGB |
| | channel_first | image data by CHW |
| RandCropImage | size | Random crop |
| RandFlipImage | | Random flip |
| NormalizeImage | scale | Normalize scale value |
| | mean | Normalize mean value |
| | std | normalized variance |
| | order | Normalize order |
| CropImage | size | crop size |
| ResizeImage | resize_short | resize by short edge |
The parameter meaning of batch_transform_ops:
| Function name | Parameter name | Specific meaning |
| ------------- | -------------- | --------------------------------------- |
| MixupOperator | alpha | Mixup parameter valuethe larger the value, the stronger the augment |
<a name="1.5.2"></a>
##### 1.5.2 sampler
| Parameter name | Specific meaning | Default value | Optional value |
| -------------- | ------------------------------------------------------------ | ----------------------- | -------------------------------------------------- |
| name | sampler type | DistributedBatchSampler | DistributedRandomIdentitySampler and other Sampler |
| batch_size | batch size | 64 | int |
| drop_last | Whether to drop the last data that does reach the batch-size | False | bool |
| shuffle | whether to shuffle the data | True | bool |
<a name="1.5.3"></a>
##### 1.5.3 loader
| Parameter name | Specific meaning | Default meaning | Optional meaning |
| ----------------- | ---------------------------- | --------------- | ---------------- |
| num_workers | Number of data read threads | 4 | int |
| use_shared_memory | Whether to use shared memory | True | bool |
<a name="1.6"></a>
#### 1.6 Evaluation metric
| Parameter name | Specific meaning | Default meaning | Optional meaning |
| -------------- | ---------------- | --------------- | ---------------- |
| TopkAcc | TopkAcc | [1, 5] | list, int |
<a name="1.7"></a>
#### 1.7 Inference
| Parameter name | Specific meaning | Default meaning | Optional meaning |
| ----------------------------- | --------------------------------- | ------------------------------------- | ---------------- |
| infer_imgs | Image address to be inferred | docs/images/whl/demo.jpg | str |
| batch_size | batch size | 10 | int |
| PostProcess.name | Post-process name | Topk | str |
| PostProcess.topk | topk value | 5 | int |
| PostProcess.class_id_map_file | mapping file of class id and name | ppcls/utils/imagenet1k_label_list.txt | str |
**Note**The interpretation of `transforms` in the Infer module refers to the interpretation of`transform_ops`in the dataset in the data reading module.
<a name="2"></a>
### 2. Distillation model
**Note**Here the training configuration of `MobileNetV3_large_x1_0` on `ImageNet-1k` distilled MobileNetV3_small_x1_0 is used as an example to explain the meaning of each parameter in detail. [Configure path](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/ImageNet/Distillation/mv3_large_x1_0_distill_mv3_small_x1_0.yaml). Only parameters that are distinct from the classification model are introduced here.
<a name="2.1"></a>
#### 2.1 Architecture
| Parameter name | Specific meaning | Default meaning | Optional meaning |
| ------------------ | --------------------------------------------------------- | ---------------------- | ---------------------------------- |
| name | model arch name | DistillationModel | —— |
| class_num | category number | 1000 | int |
| freeze_params_list | freeze_params_list | [True, False] | list |
| models | model list | [Teacher, Student] | list |
| Teacher.name | teacher model name | MobileNetV3_large_x1_0 | PaddleClas model |
| Teacher.pretrained | teacher model pre-trained weights | True | Boolean or pre-trained weight path |
| Teacher.use_ssld | whether teacher model pretrained weights are ssld weights | True | Boolean |
| infer_model_name | type of the model being inferred | Student | Teacher |
**Note**
1. list is represented in yaml as follows:
```
freeze_params_list:
- True
- False
```
2.Student's parameters are similar and will not be repeated.
<a name="2.2"></a>
#### 2.2 Loss function
| Parameter name | Specific meaning | Default meaning | Optional meaning |
| ----------------------------------- | ------------------------------------------------------------ | --------------- | ---------------- |
| DistillationCELoss | Distillation's cross-entropy loss function | —— | —— |
| DistillationCELoss.weight | Loss weight | 1.0 | float |
| DistillationCELoss.model_name_pairs | ["Student", "Teacher"] | —— | —— |
| DistillationGTCELoss.weight | Distillation's cross-entropy loss function of model and true Label | —— | —— |
| DistillationGTCELos.weight | Loss weight | 1.0 | float |
| DistillationCELoss.model_names | Model names with real label for cross-entropy | ["Student"] | —— |
<a name="2.3"></a>
#### 2.3 Evaluation metric
| Parameter name | Specific meaning | Default meaning | Optional meaning |
| ----------------------------- | ------------------- | ---------------------------- | ---------------- |
| DistillationTopkAcc | DistillationTopkAcc | including model_key and topk | —— |
| DistillationTopkAcc.model_key | the evaluated model | "Student" | "Teacher" |
| DistillationTopkAcc.topk | Topk value | [1, 5] | list, int |
**Note** `DistillationTopkAcc` has the same meaning as `TopkAcc`, except that it is only used in distillation tasks.
<a name="3"></a>
### 3. Recognition model
**Note**The training configuration of`ResNet50` on`LogoDet-3k` is used here as an example to explain the meaning of each parameter in detail. [configure path](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/Logo/ResNet50_ReID.yaml). Only parameters that are distinct from the classification model are presented here.
<a name="3.1"></a>
#### 3.1 Architechture
| Parameter name | Specific meaning | Default meaning | Optional meaning |
| ---------------------- | ------------------------------------------------------------ | --------------------------- | ------------------------------------------------------------ |
| name | Model arch | "RecModel" | ["RecModel"] |
| infer_output_key | inference output value | “feature” | ["feature", "logits"] |
| infer_add_softmax | softmaxwhether to add softmax to infercne | False | [True, False] |
| Backbone.name | Backbone name | ResNet50_last_stage_stride1 | other backbone provided by PaddleClas |
| Backbone.pretrained | Backbone pre-trained model | True | Boolean value or pre-trained model path |
| BackboneStopLayer.name | The name of the output layer in Backbone | True | The`full_name`of the feature output layer in Backbone |
| Neck.name | The name of the Neck part | VehicleNeck | the dictionary structure to be passed in, the specific input parameters for the Neck network layer |
| Neck.in_channels | Input dimension size of the Neck part | 2048 | the size is the same as BackboneStopLayer.name |
| Neck.out_channels | Output the dimension size of the Neck part, i.e. feature dimension size | 512 | int |
| Head.name | Network Head part nam | CircleMargin | Arcmargin. Etc |
| Head.embedding_size | Feature dimension size | 512 | Consistent with Neck.out_channels |
| Head.class_num | number of classes | 3000 | int |
| Head.margin | margin value in CircleMargin | 0.35 | float |
| Head.scale | scale value in CircleMargin | 64 | int |
**Note**
1.In PaddleClas, the `Neck` part is the connection part between Backbone and embedding layer, and `Head` part is the connection part between embedding layer and classification layer.。
2.`BackboneStopLayer.name` can be obtained by visualizing the model, visualization can be referred to [Netron](https://github.com/lutzroeder/netron) or [visualdl](https://github.com/PaddlePaddle/VisualDL).
3.Calling tools/export_model.py will convert the model weights to inference model, where the infer_add_softmax parameter will control whether to add the Softmax activation function afterwards, the code default is True (the last output layer in the classification task will be connected to the Softmax activation function). In the recognition task, the activation function is not required for the feature layer, so it should be set to False here.
<a name="3.2"></a>
#### 3.2 Evaluation metric
| Parameter name | Specific meaning | Default meaning | Optional meaning |
| -------------- | --------------------------- | --------------- | ---------------- |
| Recallk | Recall rate | [1, 5] | list, int |
| mAP | Average retrieval precision | None | None |

@ -0,0 +1,10 @@
models_training
================================
.. toctree::
:maxdepth: 2
config_description_en.md
recognition_en.md
classification_en.md
train_strategy_en.md

@ -0,0 +1,341 @@
# Image Recognition
Image recognition, in PaddleClas, means that the system is able to recognize the label of a given query image. Broadly speaking, image classification falls under image recognition. But unlike ordinary image recognition, it can only discriminate the learned categories and require retraining to add new ones. The image recognition in PaddleClas, however, only need to update the corresponding search library to identify the category of the unfamiliar images without retraining the model, which not only significantly promotes the usability of the recognition system but also reduces the demand for model updates, facilitating users' deployment of the application.
For an image to be queried, the image recognition process in PaddleClas is divided into three main parts:
1. Mainbody Detection: for a given query image, the mainbody detector first identifies the object, thus removing useless background information to improve the recognition accuracy.
2. Feature Extraction: for each candidate region of mainbody detection, feature extraction is performed by the feature model
3. Vector Search: the extracted features are compared with the vectors in the feature gallery for similarity to obtain their label information
The feature gallery is built in advance using the labeled image datasets. The complete image recognition system is shown in the figure below.
![img](../../images/structure.jpg)
To experience the whole image recognition system, or learn how to build a feature gallery, please refer to [Quick Start of Image Recognition](../quick_start/quick_start_recognition_en.md), which explains the overall application process. The following parts expound on the training part of the above three steps.
Please first refer to the [Installation Guide](../installation/install_paddleclas_en.md) to configure the runtime environment.
## Catalogue
- [1. Mainbody Detection](#1)
- [2. Feature Model Training](#2)
- [2.1. Data Preparation](#2.1)
- [2. 2 Single GPU-based Training and Evaluation](#2.2)
- [2.2.1 Model Training](#2.2.1)
- [2.2.2 Resume Training](#2.2.2)
- [2.2.3 Model Evaluation](#2.2.3)
- [2.3 Export Inference Model](#2.3)
- [3. Vector Search](#3)
- [4. Basic Knowledge](#4)
<a name="1"></a>
## 1. Mainbody Detection
The mainbody detection training is based on [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/develop), the only difference is that all the detection boxes in the mainbody detection task belong to the foreground, but it is necessary to modify `category_id` of the detection box in the annotation file to 1, while changing the `categories` mapping table in the whole annotation file to the following format, i.e., the whole category mapping table contains only `foreground`.
```
[{u'id': 1, u'name': u'foreground', u'supercategory': u'foreground'}]
```
For more information about the training method of mainbody detection, please refer to: [PaddleDetection Training Tutorial](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/GETTING_STARTED_cn.md#4-训练).
For more information on the introduction and download of the model provided in PaddleClas for body detection, please refer to: [PaddleDetection Tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md).
<a name="2"></a>
## 2. Feature Model Training
<a name="2.1"></a>
### 2.1 Data Preparation
- Go to PaddleClas directory.
```
## linux or mac $path_to_PaddleClas indicates the root directory of PaddleClaswhich the user needs to modify according to their real directory
cd $path_to_PaddleClas
```
- Go to the `dataset`. which the user needs to modify according to their real directory [CUB_200_2011](http://vision.ucsd.edu/sites/default/files/WelinderEtal10_CUB-200.pdf), which is a fine grid dataset with 200 different types of birds. Firstly, we need to download the dataset. For download, please refer to [Official Website](http://www.vision.caltech.edu/visipedia/CUB-200-2011.html).
```shell
# linux or mac
cd dataset
# Copy the downloaded data into a directory.
cp {Data storage path}/CUB_200_2011.tgz .
# Unzip
tar -xzvf CUB_200_2011.tgz
# Go to CUB_200_2011
cd CUB_200_2011
```
When using the dataset for image retrieval, we usually use the first 100 classes as the training set, and the last 100 classes as the testing set, so we need to process those data so as to adapt the model training of image retrieval.
```shell
# Create train and test directories
mkdir train && mkdir test
# Divide data into training set with the first 100 classes and testing set with the last 100 classes.
ls images | awk -F "." '{if(int($1)<101)print "mv images/"$0" train/"int($1)}' | sh
ls images | awk -F "." '{if(int($1)>100)print "mv images/"$0" test/"int($1)}' | sh
# Generate train_list and test_list
tree -r -i -f train | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > train_list.txt
tree -r -i -f test | grep jpg | awk -F "/" '{print $0" "int($2) " "NR}' > test_list.txt
```
So far, we have the training set (in the `train` catalog), testing set (in the `test` catalog), `train_list.txt` and `test_list.txt` of `CUB_200_2011`.
After data preparation, the `train` directory of `CUB_200_2011` should be
```
├── 1
│ ├── Black_Footed_Albatross_0001_796111.jpg
│ ├── Black_Footed_Albatross_0002_55.jpg
...
├── 10
│ ├── Red_Winged_Blackbird_0001_3695.jpg
│ ├── Red_Winged_Blackbird_0005_5636.jpg
...
```
`train_list.txt` should be
```
train/99/Ovenbird_0137_92639.jpg 99 1
train/99/Ovenbird_0136_92859.jpg 99 2
train/99/Ovenbird_0135_93168.jpg 99 3
train/99/Ovenbird_0131_92559.jpg 99 4
train/99/Ovenbird_0130_92452.jpg 99 5
...
```
The separators are shown as spaces, and the meaning of those three columns of data are the directory, label and unique id of training sets.
The format of testing set is the same as the one of training set.
**Note**
- When the gallery dataset and query dataset are the same, in order to remove the first data retrieved (the retrieved images themselves do not need to be evaluated), each data needs to correspond to a unique id for subsequent evaluation of metrics such as mAP, recall@1, etc. Please refer to [Introduction to image retrieval datasets](#Introduction to Image Retrieval Datasets) for the analysis of gallery datasets and query datasets, and [Image retrieval evaluation metrics](#Image Retrieval Evaluation Metrics) for the evaluation of mAP, recall@1, etc.
Back to `PaddleClas` root directory.
```shell
# linux or mac
cd ../../
```
<a name="2.2"></a>
### 2.2 Single GPU-based Training and Evaluation
For training and evaluation on a single GPU, the `tools/train.py` and `tools/eval.py` scripts are recommended.
PaddleClas support training with VisualDL to visualize the metric. VisualDL is a visualization analysis tool of PaddlePaddle, provides a variety of charts to show the trends of parameters, and visualizes model structures, data samples, histograms of tensors, PR curves , ROC curves and high-dimensional data distributions. It enables users to understand the training process and the model structure more clearly and intuitively so as to optimize models efficiently. For more information, please refer to [VisualDL](../others/VisualDL_en.md).
<a name="2.2.1"></a>
#### 2.2.1 Model Training
Once you have prepared the configuration file, you can start training the image retrieval task in the following way. the method used by PaddleClas to train the image retrieval is metric learning, referring to [metric learning](#metric learning) for more explanations.
```shell
# Single GPU
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Arch.Backbone.pretrained=True \
-o Global.device=gpu
# Multi GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Arch.Backbone.pretrained=True \
-o Global.device=gpu
```
`-c` is used to specify the path to the configuration file, and `-o` is used to specify the parameters that need to be modified or added, where `-o Arch.Backbone.pretrained=True` indicates that the Backbone part uses the pre-trained model. In addtion,`Arch.Backbone.pretrained` can also specify the address of a specific model weight file, which needs to be replaced with the path to your own pre-trained model weight file when using it. `-o Global.device=gpu` indicates that the GPU is used for training. If you want to use a CPU for training, you need to set `Global.device` to `cpu`.
For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_description_en.md) for specific configuration parameters.
Run the above commands to check the output log, an example is as follows:
````
```
...
[Train][Epoch 1/50][Avg]CELoss: 6.59110, TripletLossV2: 0.54044, loss: 7.13154
...
[Eval][Epoch 1][Avg]recall1: 0.46962, recall5: 0.75608, mAP: 0.21238
...
```
````
The Backbone here is MobileNetV1, if you want to use other backbone, you can rewrite the parameter `Arch.Backbone.name`, for example by adding `-o Arch.Backbone.name={other Backbone}` to the command. In addition, as the input dimension of the `Neck` section differs between models, replacing a Backbone may require rewriting the input size here in a similar way to replacing the Backbone's name.
In the Training Loss section, [CELoss](../../../ppcls/loss/celoss.py) and [TripletLossV2](../../../ppcls/loss/triplet.py) are used here with the following configuration files:
```
Loss:
Train:
- CELoss:
weight: 1.0
- TripletLossV2:
weight: 1.0
margin: 0.5
```
The final total Loss is a weighted sum of all Losses, where weight defines the weight of a particular Loss in the final total. If you want to replace other Losses, you can also change the Loss field in the configuration file, for the currently supported Losses please refer to [Loss](../../../ppcls/loss).
<a name="2.2.2"></a>
#### 2.2.2 Resume Training
If the training task is terminated for some reasons, it can be recovered by loading the checkpoints weights file and continue training:
```shell
# Single card
python3 tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.checkpoints="./output/RecModel/epoch_5" \
-o Global.device=gpu
# Multi card
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch tools/train.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.checkpoints="./output/RecModel/epoch_5" \
-o Global.device=gpu
```
There is no need to modify the configuration file, just set the `Global.checkpoints` parameter when continuing training, indicating the path to the loaded breakpoint weights file, using this parameter will load both the saved checkpoints weights and information about the learning rate, optimizer, etc.
**Note**
- The `-o Global.checkpoints` parameter need not contain the suffix name of the checkpoint weights file, the above training command will generate the breakpoint weights file as shown below during training, if you want to continue training from breakpoint `5` then the `Global.checkpoints` parameter just needs to be set to `". /output/RecModel/epoch_5"` and PaddleClas will automatically supplement the suffix name.
```
output/
└── RecModel
├── best_model.pdopt
├── best_model.pdparams
├── best_model.pdstates
├── epoch_1.pdopt
├── epoch_1.pdparams
├── epoch_1.pdstates
.
.
.
```
<a name="2.2.3"></a>
#### 2.2.3 Model Evaluation
Model evaluation can be carried out with the following commands.
```shell
# Single card
python3 tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.pretrained_model=./output/RecModel/best_model
# Multi card
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch tools/eval.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.pretrained_model=./output/RecModel/best_model
```
The above command will use `./configs/quick_start/MobileNetV1_retrieval.yaml` as a configuration file to evaluate the model obtained from the above training `./output/RecModel/best_model` for evaluation. You can also set up the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
Some of the configurable evaluation parameters are introduced as follows.
- `Arch.name`the name of the model
- `Global.pretrained_model`path to the pre-trained model file of the model to be evaluated, unlike `Global.Backbone.pretrained`, the pre-trained model is the weight of the whole model instead of the Backbone only. When it is time to do model evaluation, the weights of the whole model need to be loaded.
- `Metric.Eval`the metric to be evaluated, by default evaluates recall@1, recall@5, mAP. when you are not going to evaluate a metric, you can remove the corresponding trial marker from the configuration file; when you want to add a certain evaluation metric, you can also refer to [Metric](../../../ppcls/metric/metrics.py) section to add the relevant metric to the configuration file `Metric.Eval`.
**Note**
- When loading the model to be evaluated, the path to the model file needs to be specified, but it is not necessary to include the file suffix, PaddleClas will automatically complete the `.pdparams` suffix, e.g. [2.2.2 Resume Training](#2.2.2).
- Metric learning are generally not evaluated for TopkAcc.
<a name="2.3"></a>
### 2.3 Export Inference Model
By exporting the inference model, PaddlePaddle supports the transformation of the trained model using prediction with inference engine.
```shell
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/MobileNetV1_retrieval.yaml \
-o Global.pretrained_model=output/RecModel/best_model \
-o Global.save_inference_dir=./inference
```
`Global.pretrained_model` is used to specify the model file path, which still does not need to contain the model file suffix (e.g.[2.2.2 Model Recovery Training](#2.2.2)). When executed, it will generate the `./inference` directory, which contains the `inference.pdiparams`,`inference.pdiparams.info`, and`inference.pdmodel` files.`Global.save_inference_dir` allows you to specify the path to export the inference model. The inference model saved here is truncated at the embedding feature level, i.e. the final output of the model is n-dimensional embedding features.
The above command will generate the model structure file (`inference.pdmodel`) and the model weights file (`inference.pdiparams`), which can then be used for inference using the inference engine. The process of inference using the inference model can be found in [Predictive inference based on the Python prediction engine](../inference_deployment/python_deploy_en.md).
<a name="3"></a>
## 3. Vector Search
Vector search in PaddleClas currently supports the following environments:
```
└── CPU
├── Linux
├── MacOS
└── Windows
```
[Faiss](https://github.com/facebookresearch/faiss) is adopted as a search library, which is an efficient one for feature search and clustering. A variety of similarity search algorithms are integrated in this library to meet different scenarios. In PaddleClas, three search algorithms are supported.
- **HNSW32**: A graph indexing method boasts high retrieval accuracy and fast speed. However, the feature library only supports the function of adding images, not deleting image features. (Default method)
- **IVF**: An inverted index search method with fast speed but slightly lower precision. The feature library supports functions of adding and deleting image features.
- **FLAT**: A violent search algorithm presenting the highest precision, but slower retrieval speed in face of large data volume. The feature library supports functions of adding and deleting image features.
See its detailed introduction in the [official document](https://github.com/facebookresearch/faiss/wiki).
`Faiss` can be installed as follows:
```
pip install faiss-cpu==1.7.1post2
```
If the above cannot be properly referenced, please `uninstall` and then `install` again, especially when you are using`windows`.
<a name="4"></a>
## 4. Basic Knowledge
Image retrieval refers to a query image given a specific instance (e.g. a specific target, scene, item, etc.) that contains the same instance from a database image. Unlike image classification, image retrieval solves an open set problem where the training set may not contain the class of the image being recognised. The overall process of image retrieval is: firstly, the images are represented in a suitable feature vector, secondly, a nearest neighbour search is performed on these image feature vectors using Euclidean or Cosine distances to find similar images in the base, and finally, some post-processing techniques can be used to fine-tune the retrieval results and determine information such as the category of the image being recognised. Therefore, the key to determining the performance of an image retrieval algorithm lies in the goodness of the feature vectors corresponding to the images.
<a name="metric learning"></a>
- Metric Learning
Metric learning studies how to learn a distance function on a particular task so that the distance function can help nearest-neighbour based algorithms (kNN, k-means, etc.) to achieve better performance. Deep Metric Learning is a method of metric learning that aims to learn a mapping from the original features to a low-dimensional dense vector space (embedding space) such that similar objects on the embedding space are closer together using commonly used distance functions (Euclidean distance, cosine distance, etc.) ) on the embedding space, while the distances between objects of different classes are not close to each other. Deep metric learning has achieved very successful applications in the field of computer vision, such as face recognition, commodity recognition, image retrieval, pedestrian re-identification, etc. See [HERE](../algorithm_introduction/metric_learning_en.md) for detailed information.
<a name="Introduction to Image Retrieval Datasets"></a>
- Introduction to Image Retrieval Datasets
- Training Dataset: used to train the model so that it can learn the image features of the collection.
- Gallery Dataset: used to provide the gallery data for the image retrieval task. The gallery dataset can be the same as the training set or the test set, or different.
- Test Set (Query Dataset): used to test the goodness of the model, usually each test image in the test set is extracted with features, and then matched with the features of the underlying data to obtain recognition results, and then the metrics of the whole test set are calculated based on the recognition results.
<a name="Image Retrieval Evaluation Metrics"></a>
- Image Retrieval Evaluation Metrics
- recall: indicates the number of predicted positive cases with positive labels / the number of cases with positive labels
- recall@1: Number of predicted positive cases in top-1 with positive label / Number of cases with positive label
- recall@5: Number of all predicted positive cases in top-5 retrieved with positive label / Number of cases with positive label
- mean Average Precision(mAP)
- AP: AP refers to the average precision on different recall rates
- mAP: Average of the APs for all images in the test set

@ -0,0 +1,122 @@
# Tricks for Training
## Catalogue
- [1. Choice of Optimizers](#1)
- [2. Choice of Learning Rate and Learning Rate Declining Strategy](#2)
- [2.1 Concept of Learning Rate](#2.1)
- [2.2 Learning Rate Decline Strategy](#2.2)
- [2.3 Warmup Strategy](#2.3)
- [3. Choice of Batch_size](#3)
- [4. Choice of Weight_decay](#4)
- [5. Choice of Label_smoothing](#5)
- [6. Change the Crop Area and Stretch Transformation Degree of the Images for Small Models](#6)
- [7. Use Data Augmentation to Improve Accuracy](#7)
- [8. Determine the Tuning Strategy by Train_acc and Test_acc](#8)
- [9. Improve the Accuracy of Your Own Data Set with Existing Pre-trained Models](#9)
<a name="1"></a>
## 1. Choice of Optimizers
Since the development of deep learning, there have been many researchers working on the optimizer. The purpose of the optimizer is to make the loss function as small as possible, so as to find suitable parameters to complete a certain task. At present, the main optimizers used in model training are SGD, RMSProp, Adam, AdaDelt and so on. The SGD optimizers with momentum is widely used in academia and industry, so most of models we release are trained by SGD optimizer with momentum. But the SGD optimizer with momentum has two disadvantages, one is that the convergence speed is slow, the other is that the initial learning rate is difficult to set, however, if the initial learning rate is set properly and the models are trained in sufficient iterations, the models trained by SGD with momentum can reach higher accuracy compared with the models trained by other optimizers. Some other optimizers with adaptive learning rate such as Adam, RMSProp and so on tent to converge faster, but the final convergence accuracy will be slightly worse. If you want to train a model in faster convergence speed, we recommend you use the optimizers with adaptive learning rate, but if you want to train a model with higher accuracy, we recommend you to use SGD optimizer with momentum.
<a name="2"></a>
## 2. Choice of Learning Rate and Learning Rate Declining Strategy
The choice of learning rate is related to the optimizer, data set and tasks. Here we mainly introduce the learning rate of training ImageNet-1K with momentum + SGD as the optimizer and the choice of learning rate decline.
<a name="2.1"></a>
### 2.1 Concept of Learning Rate
the learning rate is the hyperparameter to control the learning speed, the lower the learning rate, the slower the change of the loss value, though using a low learning rate can ensure that you will not miss any local minimum, but it also means that the convergence speed is slow, especially when the gradient is trapped in a gradient plateau area.
<a name="2.2"></a>
### 2.2 Learning Rate Decline Strategy
During training, if we always use the same learning rate, we cannot get the model with highest accuracy, so the learning rate should be adjust during training. In the early stage of training, the weights are in a random initialization state and the gradients are tended to descent, so we can set a relatively large learning rate for faster convergence. In the late stage of training, the weights are close to the optimal values, the optimal value cannot be reached by a relatively large learning rate, so a relatively smaller learning rate should be used. During training, many researchers use the piecewise_decay learning rate reduction strategy, which is a stepwise decline learning rate. For example, in the training of ResNet50, the initial learning rate we set is 0.1, and the learning rate drops to 1/10 every 30 epoches, the total epoches for training is 120. Besides the piecewise_decay, many researchers also proposed other ways to decrease the learning rate, such as polynomial_decay, exponential_decay and cosine_decay and so on, among them, cosine_decay has become the preferred learning rate reduction method for improving model accuracy beacause there is no need to adjust hyperparameters and the robustness is relatively high. The learning rate curves of cosine_decay and piecewise_decay are shown in the following figures, it is easy to observe that during the entire training process, cosine_decay keeps a relatively large learning rate, so its convergence is slower, but the final convergence accuracy is better than the one using piecewise_decay.
![](../../images/models/lr_decay.jpeg)
In addition, we can also see from the figures that the number of epoches with a small learning rate in cosine_decay is fewer, which will affect the final accuracy, so in order to make cosine_decay play a better effect, it is recommended to use cosine_decay in large epoched, such as 200 epoches.
<a name="2.3"></a>
### 2.3 Warmup Strategy
If a large batch_size is adopted to train nerual network, we recommend you to adopt warmup strategy. as the name suggests, the warmup strategy is to let model learning first warm up, we do not directly use the initial learning rate at the begining of training, instead, we use a gradually increasing learning rate to train the model, when the increasing learning rate reaches the initial learning rate, the learning rate reduction method mentioned in the learning rate reduction strategy is then used to decay the learning rate. Experiments show that when the batch size is large, warmup strategy can improve the accuracy. Some model training with large batch_size such as MobileNetV3 training, we set the epoch in warmup to 5 by default, that is, first in 5 epoches, the learning rate increases from 0 to initial learning rate, then learning rate decay begins.
<a name="3"></a>
## 3. Choice of Batch_size
Batch_size is an important hyperparameter in training neural networks, batch_size determines how much data is sent to the neural network to for training at a time. In the paper [1], the author found in experiments that when batch_size is linearly related to the learning rate, the convergence accuracy is hardly affected. When training ImageNet data, an initial learning rate of 0.1 are commonly chosen for training, and batch_size is 256, so according to the actual model size and memory, you can set the learning rate to 0.1\*k, batch_size to 256\*k.
<a name="4"></a>
## 4. Choice of Weight_decay
Overfitting is a common term in machine learning. A simple understanding is that the model performs well on the training data, but it performs poorly on the test data. In the convolutional neural network, there also exists the problem of overfitting. To avoid overfitting, many regular ways have been proposed. Among them, weight_decay is one of the widely used ways to avoid overfitting. After the final loss function, L2 regularization(weight_decay) is added to the loss function, with the help of L2 regularization, the weight of the network tend to choose a smaller value, and finally the parameters in the entire network tends to 0, and the generalization performance of the model is improved accordingly. In different kinds of Deep learning frame, the meaning of L2_decay is the coefficient of L2 regularization, on paddle, the name of this value is L2_decay, so in the following the value is called L2_decay. the larger the coefficient, the more the model tends to be underfitting. In the task of training ImageNet, this parameter is set to 1e-4 in most network. In some small networks such as MobileNet networks, in order to avoid network underfitting, the value is set to 1e-5 ~ 4e-5. Of course, the setting of this value is also related to the specific data set, When the data set is large, the network itself tends to be under-fitted, and the value can be appropriately reduced. When the data set is small, the network tends to overfit itself, so the value can be increased appropriately. The following table shows the accuracy of MobileNetV1_x0_25 using different l2_decay on ImageNet-1k. Since MobileNetV1_x0_25 is a relatively small network, the large l2_decay will make the network tend to be underfitting, so in this network, 3e-5 are better choices compared with 1e-4.
| Model | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
|:--:|:--:|:--:|:--:|
| MobileNetV1_x0_25 | 1e-4 | 43.79%/67.61% | 50.41%/74.70% |
| MobileNetV1_x0_25 | 3e-5 | 47.38%/70.83% | 51.45%/75.45% |
In addition, the setting of L2_decay is also related to whether other regularization is used during training. If the data argument during the training is more complicated, which means that the training becomes more difficult, L2_decay can be appropriately reduced. The following table shows that the precision of ResNet50 using a different l2_decay on ImageNet-1K. It is easy to observe that after the training becomes difficult, using a smaller l2_decay helps to improve the accuracy of the model.
| Model | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
|:--:|:--:|:--:|:--:|
| ResNet50 | 1e-4 | 75.13%/90.42% | 77.65%/93.79% |
| ResNet50 | 7e-5 | 75.56%/90.55% | 78.04%/93.74% |
In summary, l2_decay can be adjusted according to specific tasks and models. Usually simple tasks or larger models are recommended to use Larger l2_decay, complex tasks or smaller models are recommended to use smaller l2_decay.
<a name="5"></a>
## 5. Choice of Label_smoothing
Label_smoothing is a regularization method in deep learning. Its full name is Label Smoothing Regularization (LSR), that is, label smoothing regularization. In the traditional classification task, when calculating the loss function, the real one hot label and the output of the neural network are calculated in cross-entropy formula, the label smoothing aims to make the real one hot label become smooth label, which makes the neural network no longer learn from the hard labels, but the soft labels with a probability value, where the probability of the position corresponding to the category is the largest and the probability of other positions are very small value, specific calculation method can be seen in the paper[2]. In label-smoothing, there is an epsilon parameter describing the degree of softening the label. The larger epsilon, the smaller the probability and smoother the label, on the contrary, the label tends to be hard label. during training on ImageNet-1K, the parameter is usually set to 0.1. In the experiments of training ResNet50, when using label_smoothing, the accuracy is higher than the one without label_smoothing, the following table shows the performance of ResNet50_vd with label smoothing and without label smoothing.
| Model | Use_label_smoothing | Test acc1 |
|:--:|:--:|:--:|
| ResNet50_vd | 0 | 77.9% |
| ResNet50_vd | 1 | 78.4% |
But, because label smoothing can be regarded as a regular way, on relatively small models, the accuracy improvement is not obvious or even decreases, the following table shows the accuracy performance of ResNet18 with label smoothing and without label smoothing on ImageNet-1K, it can be clearly seen that after using label smoothing, the accuracy of ResNet has decreased.
| Model | Use_label_smoohing | Train acc1/acc5 | Test acc1/acc5 |
|:--:|:--:|:--:|:--:|
| ResNet18 | 0 | 69.81%/87.70% | 70.98%/89.92% |
| ResNet18 | 1 | 68.00%/86.56% | 70.81%/89.89% |
In summary, the use of label_smoohing for larger models can effectively improve the accuracy of the model, and the use of label_smoohing for smaller models may reduce the accuracy of the model, so before deciding whether to use label_smoohing, you need to evaluate the size of the model and the difficulty of the task.
<a name="6"></a>
## 6. Change the Crop Area and Stretch Transformation Degree of the Images for Small Models
In the standard preprocessing of ImageNet-1k data, two values of scale and ratio are defined in the random_crop function. These two values respectively determine the size of the image crop and the degree of stretching of the image. The default value of scale is 0.08-1(lower_scale-upper_scale), the default value range of ratio is 3/4-4/3(lower_ratio-upper_ratio). In small network training, such data argument will make the network underfitting, resulting in a decrease in accuracy. In order to improve the accuracy of the network, you can make the data argument weaker, that is, increase the crop area of the images or weaken the degree of stretching and transformation of the images, we can achieve weaker image transformation by increasing the value of lower_scale or narrowing the gap between lower_ratio and upper_scale. The following table lists the accuracy of training MobileNetV2_x0_25 with different lower_scale. It can be seen that the training accuracy and validation accuracy are improved after increasing the crop area of the images
| Model | Scale Range | Train_acc1/acc5 | Test_acc1/acc5 |
|:--:|:--:|:--:|:--:|
| MobileNetV2_x0_25 | [0.08,1] | 50.36%/72.98% | 52.35%/75.65% |
| MobileNetV2_x0_25 | [0.2,1] | 54.39%/77.08% | 53.18%/76.14% |
<a name="7"></a>
## 7. Use Data Augmentation to Improve Accuracy
In general, the size of the data set is critical to the performances, but the annotation of images are often more expensive, so the number of annotated images are often scarce. In this case, the data argument is particularly important. In the standard data augmentation for training on ImageNet-1k, two data augmentation methods which are random_crop and random_flip are mainly used. However, in recent years, more and more data augmentation methods have been proposed, such as cutout, mixup, cutmix, AutoAugment, etc. Experiments show that these data augmentation methods can effectively improve the accuracy of the model. The following table lists the performance of ResNet50 in 8 different data augmentation methods. It can be seen that compared to the baseline, all data augmentation methods can be useful for the accuracy of ResNet50, among them cutmix is currently the most effective data argument. More data argument can be seen here[**Data Argument**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/image_augmentation/ImageAugment.html).
| Model | Data Argument | Test top-1 |
|:--:|:--:|:--:|
| ResNet50 | Baseline | 77.31% |
| ResNet50 | Auto-Augment | 77.95% |
| ResNet50 | Mixup | 78.28% |
| ResNet50 | Cutmix | 78.39% |
| ResNet50 | Cutout | 78.01% |
| ResNet50 | Gridmask | 77.85% |
| ResNet50 | Random-Augment | 77.70% |
| ResNet50 | Random-Erasing | 77.91% |
| ResNet50 | Hide-and-Seek | 77.43% |
<a name="8"></a>
## 8. Determine the Tuning Strategy by Train_acc and Test_acc
In the process of training the network, the training set accuracy rate and validation set accuracy rate of each epoch are usually printed. Generally speaking, the accuracy of the training set is slightly higher than the accuracy of the validation set or the same are good state in training, but if you find that the accuracy of training set is much higher than the one of validation set, it means that overfitting happens in your task, which need more regularization, such as increase the value of L2_decay, using more data argument or label smoothing and so on. If you find that the accuracy of training set is lower than the one of validation set, it means that underfitting happens in your task, which recommend you to decrease the value of L2_decay, using fewer data argument, increase the area of the crop area of the images, weaken the stretching transformation of the images, remove label_smoothing, etc.
<a name="9"></a>
## 9. Improve the Accuracy of Your Own Data Set with Existing Pre-trained Models
In the field of computer vision, it has become common to load pre-trained models to train one's own tasks. Compared with starting training from random initialization, loading pre-trained models can often improve the accuracy of specific tasks. In general, the pre-trained model widely used in the industry is obtained from the ImageNet-1k dataset. The fc layer weight of the pre-trained model is a matrix of k\*1000, where k is The number of neurons before, and the weights of the fc layer is not need to load because of the different tasks. In terms of learning rate, if your training data set is particularly small (such as less than 1,000), we recommend that you use a smaller initial learning rate, such as 0.001 (batch_size: 256, the same below), to avoid a large learning rate undermine pre-training weights, if your training data set is relatively large (greater than 100,000), we recommend that you try a larger initial learning rate, such as 0.01 or greater.
> If you think this guide is helpful to you, welcome to star our repo:[https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
## Reference
[1]P. Goyal, P. Dolla ́r, R. B. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017.
[2]C.Szegedy,V.Vanhoucke,S.Ioffe,J.Shlens,andZ.Wojna. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.

@ -0,0 +1,57 @@
# Use VisualDL to visualize the training
---
## Catalogue
* [1. Preface](#1)
* [2. Use VisualDL in PaddleClas](#2)
* [2.1 Set config and start training](#2.1)
* [2.2 Start VisualDL](#2.2)
<a name='1'></a>
## 1. Preface
VisualDL, a visualization analysis tool of PaddlePaddle, provides a variety of charts to show the trends of parameters, and visualizes model structures, data samples, histograms of tensors, PR curves , ROC curves and high-dimensional data distributions. It enables users to understand the training process and the model structure more clearly and intuitively so as to optimize models efficiently. For more information, please refer to [VisualDL](https://github.com/PaddlePaddle/VisualDL/).
<a name='2'></a>
## 2. Use VisualDL in PaddleClas
Now PaddleClas support use VisualDL to visualize the changes of learning rate, loss, accuracy in training.
<a name='2.1'></a>
### 2.1 Set config and start training
You only need to set the field `Global.use_visualdl` to `True` in train config:
```yaml
# config.yaml
Global:
...
use_visualdl: True
...
```
PaddleClas will save the VisualDL logs to subdirectory `vdl/` under the output directory specified by `Global.output_dir`. And then you just need to start training normally:
```shell
python3 tools/train.py -c config.yaml
```
<a name='2.2'></a>
### 2.2 Start VisualDL
After starting the training program, you can start the VisualDL service in a new terminal session:
```shell
visualdl --logdir ./output/vdl/
```
In the above command, `--logdir` specify the directory of the VisualDL logs produced in training. VisualDL will traverse and iterate to find the subdirectories of the specified directory to visualize all the experimental results. You can also use the following parameters to set the IP and port number of the VisualDL service:
* `--host`ip, default is 127.0.0.1
* `--port`port, default is 8040
More information about the commandplease refer to [VisualDL](https://github.com/PaddlePaddle/VisualDL/blob/develop/README.md#2-launch-panel).
Then you can enter the address `127.0.0.1:8840` and view the training process in the browser:
![](../../images/VisualDL/train_loss.png)

@ -0,0 +1,21 @@
### Competition Support
PaddleClas stems from the Baidu's visual business applications and the exploration of frontier visual capabilities. It has helped us achieve leading results in many key events, and continues to promote more frontier visual solutions and landing applications.
* 1st place in 2018 Kaggle Open Images V4 object detection challenge
* 2nd place in 2019 Kaggle Open Images V5 object detection challenge
* The report is avaiable here: [https://arxiv.org/pdf/1911.07171.pdf](https://arxiv.org/pdf/1911.07171.pdf)
* The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/featured_model/OIDV5_BASELINE_MODEL.md)
* 2nd place in Kacggle Landmark Retrieval Challenge 2019
* The report is avaiable here: [https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
* The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
* 2nd place in Kaggle Landmark Recognition Challenge 2019
* The report is avaiable here: [https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
* The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
* A-level certificate of three tasks: printed text OCR, face recognition and landmark recognition in the first multimedia information recognition technology competition

@ -0,0 +1,107 @@
# Guide to Feature Graph Visualization
------
## Catalogue
- [1. Overview](#1)
- [2. Prepare Work](#2)
- [3. Model Modification](#3)
- [4. Results](#4)
<a name='1'></a>
## 1. Overview
The feature graph is the feature representation of the input image in the convolutional network, and the study of which can be beneficial to our understanding and design of the model. Therefore, we employ this tool to visualize the feature graph based on the dynamic graph.
<a name='2'></a>
## 2. Prepare Work
The first step is to select the model to be studied, here we choose ResNet50. Copy the model networking code [resnet.py](../../../ppcls/arch/backbone/legendary_models/resnet.py) to [directory](../../../ppcls/utils/feature_maps_visualization/) and download the [ResNet50 pre-training model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams) or follow the command below.
```bash
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_pretrained.pdparams
```
For other pre-training models and codes of network structure, please download [model library](../../../ppcls/arch/backbone/) and [pre-training models](../models/models_intro_en.md).
<a name='3'></a>
## 3. Model Modification
Having found the location of the needed feature graph, set self.fm to fetch it out. Here we adopt the feature graph after the stem layer in resnet50 as an example.
Specify the feature graph to be visualized in the forward function of ResNet50
```python
def forward(self, x):
with paddle.static.amp.fp16_guard():
if self.data_format == "NHWC":
x = paddle.transpose(x, [0, 2, 3, 1])
x.stop_gradient = True
x = self.stem(x)
fm = x
x = self.max_pool(x)
x = self.blocks(x)
x = self.avg_pool(x)
x = self.flatten(x)
x = self.fc(x)
return x, fm
```
Then modify the code [fm_vis.py](../../../ppcls/utils/feature_maps_visualization/fm_vis.py) to import `ResNet50`instantiating the `net` object:
```
from resnet import ResNet50
net = ResNet50()
```
Finally, execute the function
```
python tools/feature_maps_visualization/fm_vis.py \
-i the image you want to test \
-c channel_num -p pretrained model \
--show whether to show \
--interpolation interpolation method\
--save_path where to save \
--use_gpu whether to use gpu
```
Parameters
- `-i`: the path of the image file to be predicted, such as`./test.jpeg`
- `-c`: the dimension of feature graph, such as `5`
- `-p`: path of the weight file, such as `./ResNet50_pretrained`
- `--interpolation`: image interpolation method, default value 1
- `--save_path`: save path, such as `./tools/`
- `--use_gpu`: whether to enable GPU inference, default value: True
<a name='4'></a>
## 4. Results
- Import the Image
![](../../images/feature_maps/feature_visualization_input.jpg)
- Run the following script of feature graph visualization
```
python tools/feature_maps_visualization/fm_vis.py \
-i ./docs/images/feature_maps/feature_visualization_input.jpg \
-c 5 \
-p pretrained/ResNet50_pretrained/ \
--show=True \
--interpolation=1 \
--save_path="./output.png" \
--use_gpu=False
```
- Save the output feature graph as `output.png`, as shown below.
![](../../images/feature_maps/feature_visualization_output.jpg)

@ -0,0 +1,15 @@
others
================================
.. toctree::
:maxdepth: 2
transfer_learning_en.md
train_with_DALI_en.md
VisualDL_en.md
train_on_xpu_en.md
feature_visiualization_en.md
paddle_mobile_inference_en.md
competition_support_en.md
update_history_en.md
versions_en.md

@ -0,0 +1,130 @@
# Benchmark on Mobile
---
## Catalogue
* [1. Introduction](#1)
* [2. Evaluation Steps](#2)
* [2.1 Export the Inference Model](#2.1)
* [2.2 Download Benchmark Binary File](#2.2)
* [2.3 Inference benchmark](#2.3)
* [2.4 Model Optimization and Speed Evaluation](#2.4)
<a name='1'></a>
## 1. Introduction
[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) is a set of lightweight inference engine which is fully functional, easy to use and then performs well. Lightweighting is reflected in the use of fewer bits to represent the weight and activation of the neural network, which can greatly reduce the size of the model, solve the problem of limited storage space of the mobile device, and the inference speed is better than other frameworks on the whole.
In [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), we uses Paddle-Lite to [evaluate the performance on the mobile device](../models/Mobile_en.md), in this section we uses the `MobileNetV1` model trained on the `ImageNet1k` dataset as an example to introduce how to use `Paddle-Lite` to evaluate the model speed on the mobile terminal (evaluated on SD855)
<a name='2'></a>
## 2. Evaluation Steps
<a name='2.1'></a>
### 2.1 Export the Inference Model
* First you should transform the saved model during training to the special model which can be used to inference, the special model can be exported by `tools/export_model.py`, the specific way of transform is as follows.
```shell
python tools/export_model.py -m MobileNetV1 -p pretrained/MobileNetV1_pretrained/ -o inference/MobileNetV1
```
Finally the `model` and `parmas` can be saved in `inference/MobileNetV1`.
<a name='2.2'></a>
### 2.2 Download Benchmark Binary File
* Use the adb (Android Debug Bridge) tool to connect the Android phone and the PC, then develop and debug. After installing adb and ensuring that the PC and the phone are successfully connected, use the following command to view the ARM version of the phone and select the pre-compiled library based on ARM version.
```shell
adb shell getprop ro.product.cpu.abi
```
* Download Benchmark_bin File
```shell
wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v8
```
If the ARM version is v7, the v7 benchmark_bin file should be downloaded, the command is as follow.
```shell
wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v7
```
<a name='2.3'></a>
### 2.3 Inference benchmark
After the PC and mobile phone are successfully connected, use the following command to start the model evaluation.
```
sh deploy/lite/benchmark/benchmark.sh ./benchmark_bin_v8 ./inference result_armv8.txt true
```
Where `./benchmark_bin_v8` is the path of the benchmark binary file, `./inference` is the path of all the models that need to be evaluated, `result_armv8.txt` is the result file, and the final parameter `true` means that the model will be optimized before evaluation. Eventually, the evaluation result file of `result_armv8.txt` will be saved in the current folder. The specific performances are as follows.
```
PaddleLite Benchmark
Threads=1 Warmup=10 Repeats=30
MobileNetV1 min = 30.89100 max = 30.73600 average = 30.79750
Threads=2 Warmup=10 Repeats=30
MobileNetV1 min = 18.26600 max = 18.14000 average = 18.21637
Threads=4 Warmup=10 Repeats=30
MobileNetV1 min = 10.03200 max = 9.94300 average = 9.97627
```
Here is the model inference speed under different number of threads, the unit is FPS, taking model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.79750FPS`.
<a name='2.4'></a>
### 2.4 Model Optimization and Speed Evaluation
* In II.III section, we mention that the model will be optimized before evaluation, here you can first optimize the model, and then directly load the optimized model for speed evaluation
* Paddle-Lite
In Paddle-Lite, we provides multiple strategies to automatically optimize the original training model, which contain Quantify, Subgraph fusion, Hybrid scheduling, Kernel optimization and so on. In order to make the optimization more convenient and easy to use, we provide opt tools to automatically complete the optimization steps and output a lightweight, optimal and executable model in Paddle-Lite, which can be downloaded on [Paddle-Lite Model Optimization Page](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html). Here we take `MacOS` as our development environment, download[opt_mac](https://paddlelite-data.bj.bcebos.com/model_optimize_tool/opt_mac) model optimization tools and use the following commands to optimize the model.
```shell
model_file="../MobileNetV1/model"
param_file="../MobileNetV1/params"
opt_models_dir="./opt_models"
mkdir ${opt_models_dir}
./opt_mac --model_file=${model_file} \
--param_file=${param_file} \
--valid_targets=arm \
--optimize_out_type=naive_buffer \
--prefer_int8_kernel=false \
--optimize_out=${opt_models_dir}/MobileNetV1
```
Where the `model_file` and `param_file` are exported model file and the file address respectively, after transforming successfully, the `MobileNetV1.nb` will be saved in `opt_models`
Use the benchmark_bin file to load the optimized model for evaluation. The commands are as follows.
```shell
bash benchmark.sh ./benchmark_bin_v8 ./opt_models result_armv8.txt
```
Finally the result is saved in `result_armv8.txt` and shown as follow.
```
PaddleLite Benchmark
Threads=1 Warmup=10 Repeats=30
MobileNetV1_lite min = 30.89500 max = 30.78500 average = 30.84173
Threads=2 Warmup=10 Repeats=30
MobileNetV1_lite min = 18.25300 max = 18.11000 average = 18.18017
Threads=4 Warmup=10 Repeats=30
MobileNetV1_lite min = 10.00600 max = 9.90000 average = 9.96177
```
Taking the model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.84173FPS`.
More specific parameter explanation and Paddle-Lite usage can refer to [Paddle-Lite docs](https://paddle-lite.readthedocs.io/zh/latest/)。

@ -0,0 +1,87 @@
# Introduction to Image Classification Model Kunlun (Continuously updated)
------
## Catalogue
- [1. Foreword](#1)
- [2. Training of Kunlun](#2)
- [2.1 ResNet50](#2.1)
- [2.2 MobileNetV3](#2.2)
- [2.3 HRNet](#2.3)
- [2.4 VGG16/19](#2.4)
<a name='1'></a>
## 1. Forword
- This document describes the models currently supported by Kunlun and how to train these models on Kunlun devices. To install PaddlePaddle that supports Kunlun, please refer to [install_kunlun](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/09_hardware_support/xpu_docs/paddle_install_cn.html)
<a name='2'></a>
## 2. Training of Kunlun
- See [quick_start](../quick_start/quick_start_classification_new_user_en.md)for data sources and pre-trained models. The training effect of Kunlun is aligned with CPU/GPU.
<a name='2.1'></a>
### 2.1 ResNet50
- Command:
```
python3.7 ppcls/static/train.py \
-c ppcls/configs/quick_start/kunlun/ResNet50_vd_finetune_kunlun.yaml \
-o use_gpu=False \
-o use_xpu=True \
-o is_distributed=False
```
The difference with cpu/gpu training lies in the addition of -o use_xpu=True, indicating that the execution is on a Kunlun device.
<a name='2.2'></a>
### 2.2 MobileNetV3
- Command
```
python3.7 ppcls/static/train.py \
-c ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
-o use_gpu=False \
-o use_xpu=True \
-o is_distributed=False
```
<a name='2.3'></a>
### 2.3 HRNet
- Command
```
python3.7 ppcls/static/train.py \
-c ppcls/configs/quick_start/kunlun/HRNet_W18_C_finetune_kunlun.yaml \
-o is_distributed=False \
-o use_xpu=True \
-o use_gpu=False
```
<a name='2.4'></a>
### 2.4 VGG16/19
- Command
```
python3.7 ppcls/static/train.py \
-c ppcls/configs/quick_start/kunlun/VGG16_finetune_kunlun.yaml \
-o use_gpu=False \
-o use_xpu=True \
-o is_distributed=False
python3.7 ppcls/static/train.py \
-c ppcls/configs/quick_start/kunlun/VGG19_finetune_kunlun.yaml \
-o use_gpu=False \
-o use_xpu=True \
-o is_distributed=False
```

@ -0,0 +1,79 @@
# Train with DALI
---
## Catalogue
* [1. Preface](#1)
* [2. Installing DALI](#2)
* [3. Using DALI](#3)
* [4. Train with FP16](#4)
<a name='1'></a>
## 1. Preface
[The NVIDIA Data Loading Library](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html) (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It can build Dataloader of PaddlePaddle.
Since the Deep Learning relies on a large amount of data in the training stage, these data need to be loaded and preprocessed. These operations are usually executed on the CPU, which limits the further improvement of the training speed, especially when the batch_size is large, which become the bottleneck of training speed. DALI can use GPU to accelerate these operations, thereby further improve the training speed.
<a name='2'></a>
## 2. Installing DALI
DALI only support Linux x64 and version of CUDA is 10.2 or later.
* For CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100
* For CUDA 11.0:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110
For more information about installing DALI, please refer to [DALI](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html).
<a name='3'></a>
## 3. Using DALI
Paddleclas supports training with DALI. Since DALI only supports GPU training, `CUDA_VISIBLE_DEVICES` needs to be set, and DALI needs to occupy GPU memory, so it needs to reserve GPU memory for Dali. To train with DALI, just set the fields in the training config `use_dali = True`, or start the training by the following command:
```shell
# set the GPUs that can be seen
export CUDA_VISIBLE_DEVICES="0"
python ppcls/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml -o Global.use_dali=True
```
And you can train with muti-GPUs:
```shell
# set the GPUs that can be seen
export CUDA_VISIBLE_DEVICES="0,1,2,3"
# set the GPU memory used for neural network training, generally 0.8 or 0.7, and the remaining GPU memory is reserved for DALI
export FLAGS_fraction_of_gpu_memory_to_use=0.80
python -m paddle.distributed.launch \
--gpus="0,1,2,3" \
ppcls/train.py \
-c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml \
-o Global.use_dali=True
```
<a name='4'></a>
## 4. Train with FP16
On the basis of the above, using FP16 half-precision can further improve the training speed, you can refer to the following command.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
export FLAGS_fraction_of_gpu_memory_to_use=0.8
python -m paddle.distributed.launch \
--gpus="0,1,2,3" \
ppcls/train.py \
-c ./ppcls/configs/ImageNet/ResNet/ResNet50_fp16_dygraph.yaml
```

@ -0,0 +1,103 @@
# Transfer learning in image classification
Transfer learning is an important part of machine learning, which is widely used in various fields such as text and images. Here we mainly introduce transfer learning in the field of image classification, which is often called domain transfer, such as migration of the ImageNet classification model to the specified image classification task, such as flower classification.
---
## Catalogue
* [1. Hyperparameter search](#1)
* [1.1 Grid search](#1.1)
* [1.2 Bayesian search](#1.2)
* [2. Large-scale image classification](#2)
* [3. Reference](#3)
<a name='1'></a>
## 1. Hyperparameter search
ImageNet is the widely used dataset for image classification. A series of empirical hyperparameters have been summarized. High accuracy can be got using the hyperparameters. However, when applied in the specified dataset, the hyperparameters may not be optimal. There are two commonly used hyperparameter search methods that can be used to help us obtain better model hyperparameters.
<a name='1.1'></a>
### 1.1 Grid search
For grid search, which is also called exhaustive search, the optimal value is determined by finding the best solution from all solutions in the search space. The method is simple and effective, but when the search space is large, it takes huge computing resource.
<a name='1.2'></a>
### 1.2 Bayesian search
Bayesian search, which is also called Bayesian optimization, is realized by randomly selecting a group of hyperparameters in the search space. Gaussian process is used to update the hyperparameters, compute their expected mean and variance according to the performance of the previous hyperparameters. The larger the expected mean, the greater the probability of being close to the optimal solution. The larger the expected variance, the greater the uncertainty. Usually, the hyperparameter point with large expected mean is called `exporitation`, and the hyperparameter point with large variance is called `exploration`. Acquisition function is defined to balance the expected mean and variance. The currently selected hyperparameter point is viewed as the optimal position with maximum probability.
According to the above two search schemes, we carry out some experiments based on fixed scheme and two search schemes on 8 open source datasets. As the experimental scheme in [1], we search for 4 hyperparameters, the search space and The experimental results are as follows:
a fixed set of parameter experiments and two search schemes on 8 open source data sets. With reference to the experimental scheme of [1], we search for 4 hyperparameters, the search space and the experimental results are as follows:
- Fixed scheme.
```
lr=0.003l2 decay=1e-4label smoothing=Falsemixup=False
```
- Search space of the hyperparameters.
```
lr: [0.1, 0.03, 0.01, 0.003, 0.001, 0.0003, 0.0001]
l2 decay: [1e-3, 3e-4, 1e-4, 3e-5, 1e-5, 3e-6, 1e-6]
label smoothing: [False, True]
mixup: [False, True]
```
It takes 196 times for grid search, and takes 10 times less for Bayesian search. The baseline is trained by using ImageNet1k pretrained model based on ResNet50_vd and fixed scheme. The follow shows the experiments.
| Dataset | Fix scheme | Grid search | Grid search time | Bayesian search | Bayesian search time|
| ------------------ | -------- | -------- | -------- | -------- | ---------- |
| Oxford-IIIT-Pets | 93.64% | 94.55% | 196 | 94.04% | 20 |
| Oxford-102-Flowers | 96.08% | 97.69% | 196 | 97.49% | 20 |
| Food101 | 87.07% | 87.52% | 196 | 87.33% | 23 |
| SUN397 | 63.27% | 64.84% | 196 | 64.55% | 20 |
| Caltech101 | 91.71% | 92.54% | 196 | 92.16% | 14 |
| DTD | 76.87% | 77.53% | 196 | 77.47% | 13 |
| Stanford Cars | 85.14% | 92.72% | 196 | 92.72% | 25 |
| FGVC Aircraft | 80.32% | 88.45% | 196 | 88.36% | 20 |
- The above experiments verify that Bayesian search only reduces the accuracy by 0% to 0.4% under the condition of reducing the number of searches by about 10 times compared to grid search.
- The search space can be expaned easily using Bayesian search.
<a name='2'></a>
## Large-scale image classification
In practical applications, due to the lack of training data, the classification model trained on the ImageNet1k data set is often used as the pretrained model for other image classification tasks. In order to further help solve practical problems, based on ResNet50_vd, Baidu open sourced a self-developed large-scale classification pretrained model, in which the training data contains 100,000 categories and 43 million pictures. The pretrained model can be downloaded as follows[**download link**](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/ResNet50_vd_10w_pretrained.pdparams)
We conducted transfer learning experiments on 6 self-collected datasets,
using a set of fixed parameters and a grid search method, in which the number of training rounds was set to 20epochs, the ResNet50_vd model was selected, and the ImageNet pre-training accuracy was 79.12%. The comparison results of the experimental data set parameters and model accuracy are as follows:
Fixed scheme
```
lr=0.001l2 decay=1e-4label smoothing=Falsemixup=False
```
| Dataset | Statstics | **Pretrained moel on ImageNet <br />Top-1(fixed)/Top-1(search)** | **Pretrained moel on large-scale dataset<br />Top-1(fixed)/Top-1(search)** |
| --------------- | ----------------------------------------- | -------------------------------------------------------- | --------------------------------------------------------- |
| Flowers | class:102<br />train:5789<br />valid:2396 | 0.7779/0.9883 | 0.9892/0.9954 |
| Hand-painted stick figures | Class:18<br />train:1007<br />valid:432 | 0.8795/0.9196 | 0.9107/0.9219 |
| Leaves | class:6<br />train:5256<br />valid:2278 | 0.8212/0.8482 | 0.8385/0.8659 |
| Container vehicle | Class:115<br />train:4879<br />valid:2094 | 0.6230/0.9556 | 0.9524/0.9702 |
| Chair | class:5<br />train:169<br />valid:78 | 0.8557/0.9688 | 0.9077/0.9792 |
| Geology | class:4<br />train:671<br />valid:296 | 0.5719/0.8094 | 0.6781/0.8219 |
- The above experiments verified that for fixed parameters, compared with the pretrained model on ImageNet, using the large-scale classification model as a pretrained model can help us improve the model performance on a new dataset in most cases. Parameter search can be further helpful to the model performance.
<a name='3'></a>
## Reference
[1] Kornblith, Simon, Jonathon Shlens, and Quoc V. Le. "Do better imagenet models transfer better?." *Proceedings of the IEEE conference on computer vision and pattern recognition*. 2019.
[2] Kolesnikov, Alexander, et al. "Large Scale Learning of General Visual Representations for Transfer." *arXiv preprint arXiv:1912.11370* (2019).

@ -0,0 +1,55 @@
# Release Notes
- 2021.04.15
- Add `MixNet` and `ReXNet` pretrained models, `MixNet_L`'s Top-1 Acc on ImageNet-1k reaches 78.6% and `ReXNet_3_0` reaches 82.09%.
- 2021.01.27
* Add ViT and DeiT pretrained models, ViT's Top-1 Acc on ImageNet reaches 81.05%, and DeiT reaches 85.5%.
- 2021.01.08
* Add support for whl package and its usage, Model inference can be done by simply install paddleclas using pip.
- 2020.12.16
* Add support for TensorRT when using cpp inference to obain more obvious acceleration.
- 2020.12.06
* Add `SE_HRNet_W64_C_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 84.75%.
- 2020.11.23
* Add `GhostNet_x1_3_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 79.38%.
- 2020.11.09
* Add `InceptionV3` architecture and pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 79.1%.
* 2020.10.20
* Add `Res2Net50_vd_26w_4s_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 83.1%.
* Add `Res2Net101_vd_26w_4s_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 83.9%.
- 2020.10.12
* Add Paddle-Lite demo.
- 2020.10.10
* Add cpp inference demo.
* Improve FAQ tutorials.
* 2020.09.17
* Add `HRNet_W48_C_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 83.62%.
* Add `ResNet34_vd_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 79.72%.
* 2020.09.07
* Add `HRNet_W18_C_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 81.16%.
* Add `MobileNetV3_small_x0_35_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 55.55%.
* 2020.07.14
* Add `Res2Net200_vd_26w_4s_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 85.13%.
* Add `Fix_ResNet50_vd_ssld_v2` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 84.00%.
* 2020.06.17
* Add English documents.
* 2020.06.12
* Add support for training and evaluation on Windows or CPU.
* 2020.05.17
* Add support for mixed precision training.
* 2020.05.09
* Add user guide about Paddle Serving and Paddle-Lite.
* Add benchmark about FP16/FP32 on T4 GPU.
* 2020.04.14
* First commit.

@ -0,0 +1,60 @@
# Version Updates
------
## Catalogue
- [1. v2.3](#1)
- [2. v2.2](#2)
<a name='1'></a>
## 1. v2.3
- Model Update
- Add pre-training weights for lightweight models, including detection models and feature models
- Release PP-LCNet series of models, which are self-developed ones designed to run on CPU
- Enable SwinTransformer, Twins, and Deit to support direct training from scrach to achieve thesis accuracy.
- Basic framework capabilities
- Add DeepHash module, which supports feature model to directly export binary features
- Add PKSampler, which tackles the problem that feature models cannot be trained by multiple machines and cards
- Support PaddleSlim: support quantization, pruning training, and offline quantization of classification models and feature models
- Enable legendary models to support intermediate model output
- Support multi-label classification training
- Inference Deployment
- Replace the original feature retrieval library with Faiss to improve platform adaptability
- Support PaddleServing: support the deployment of classification models and image recognition process
- Versions of the Recommendation Library
- python: 3.7
- PaddlePaddle: 2.1.3
- PaddleSlim: 2.2.0
- PaddleServing: 0.6.1
<a name='2'></a>
## 2. v2.2
- Model Updates
- Add models including LeViT, Twins, TNT, DLA, HardNet, RedNet, and SwinTransfomer
- Basic framework capabilities
- Divide the classification models into two categories
- legendary models: introduce TheseusLayer base class, add the interface to modify the network function, and support the networking data truncation and output
- model zoo: other common classification models
- Add the support of Metric Learning algorithm
- Add a variety of related loss algorithms, and the basic network module gears (allow the combination with backbone and loss) for convenient use
- Support both the general classification and metric learning-related training
- Support static graph training
- Classification training with dali acceleration supported
- Support fp16 training
- Application Updates
- Add specific application cases and related models of product recognition, vehicle recognition (vehicle fine-grained classification, vehicle ReID), logo recognition, animation character recognition
- Add a complete pipeline for image recognition, including detection module, feature extraction module, and vector search module
- Inference Deployment
- Add Mobius, Baidu's self-developed vector search module, to support the inference deployment of the image recognition system
- Image recognition, build feature library that allows batch_size>1
- Documents Update
- Add image recognition related documents
- Fix bugs in previous documents
- Versions of the Recommendation Library
- python: 3.7
- PaddlePaddle: 2.1.2

@ -0,0 +1,10 @@
quick_start
================================
.. toctree::
:maxdepth: 2
quick_start_classification_new_user_en.md
quick_start_classification_professional_en.md
quick_start_recognition_en.md
quick_start_multilabel_classification_en.md

@ -0,0 +1,194 @@
# Trial in 30mins(new users)
This tutorial is mainly for new users, that is, users who are in the introductory stage of deep learning-related theoretical knowledge, know some python grammar, and can read simple codes. This content mainly includes the use of PaddleClas for image classification network training and model prediction.
---
## Catalogue
- [1. Basic knowledge](#1)
- [2. Environmental installation and configuration](#2)
- [3. Data preparation and processing](#3)
- [4. Model training](#4)
- [4.1 Use CPU for model training](#4.1)
- [4.1.1 Training without using pre-trained models](#4.1.1)
- [4.1.2 Use pre-trained models for training](#4.1.2)
- [4.2 Use GPU for model training](#4.2)
- [4.2.1 Training without using pre-trained models](#4.2.1)
- [4.2.2 Use pre-trained models for training](#4.2.2)
- [5. Model prediction](#5)
<a name="1"></a>
## 1. Basic knowledge
Image classification is a pattern classification problem, which is the most basic task in computer vision. Its goal is to classify different images into different categories. We will briefly explain some concepts that need to be understood during model training. We hope to be helpful to you who are experiencing PaddleClas for the first time:
- train/val/test dataset represents training set, validation set and test set respectively:
- Training dataset: used to train the model so that the model can recognize different types of features;
- Validation set (val dataset): the test set during the training process, which is convenient for checking the status of the model during the training process;
- Test dataset: After training the model, the test dataset is used to evaluate the results of the model.
- Pre-trained model
Using a pre-trained model trained on a larger dataset, that is, the weights of the parameters are preset, can help the model converge faster on the new dataset. Especially for some tasks with scarce training data, when the neural network parameters are very large, we may not be able to fully train the model with a small amount of training data. The method of loading the pre-trained model can be thought of as allowing the model to learn based on a better initial weight, so as to achieve better performance.
- epoch
The total number of training epochs of the model. The model passes through all the samples in the training set once, which is an epoch. When the difference between the error rate of the validation set and the error rate of the training set is small, the current number of epochs can be considered appropriate; when the error rate of the validation set first decreases and then becomes larger, it means that the number of epochs is too large and the number of epochs needs to be reduced. Otherwise, the model may overfit the training set.
- Loss Function
During the training process, measure the difference between the model output (predicted value) and the ground truth.
- Accuracy (Acc): indicates the proportion of the number of samples with correct predictions to the total data
- Top1 Acc: If the classification with the highest probability in the prediction result is correct, it is judged to be correct;
- Top5 Acc: If there is a correct classification among the top 5 probability rankings in the prediction result, it is judged as correct;
<a name="2"></a>
## 2. Environmental installation and configuration
For specific installation steps, please refer to [Paddle Installation Document](../installation/install_paddle_en.md), [PaddleClas Installation Document](../installation/install_paddleclas_en.md).
<a name="3"></a>
## 3. Data preparation and processing
Enter the PaddleClas directory:
```shell
# linux or mac $path_to_PaddleClas represents the root directory of PaddleClas, and users need to modify it according to their real directory.
cd $path_to_PaddleClas
```
Enter the `dataset/flowers102` directory, download and unzip the flowers102 dataset:
```shell
# linux or mac
cd dataset/
# If you want to download directly from the browser, you can copy the link and visit, then download and unzip
wget https://paddle-imagenet-models-name.bj.bcebos.com/data/flowers102.zip
# unzip
unzip flowers102.zip
```
If there is no `wget` command or if you are downloading in the Windows operating system, you need to copy the address to the browser to download, and unzip it to the directory `PaddleClas/dataset/`.
After the unzip operation is completed, there are three `.txt` files for training and testing under the directory `PaddleClas/dataset/flowers102`: `train_list.txt` (training set, 1020 images), `val_list.txt` (validation Set, 1020 images), `train_extra_list.txt` (larger training set, 7169 images). The format of each line in the file: **image relative path** **image label_id** (note: there is a space between the two columns), and there is also a mapping file for label id and category name: `flowers102_label_list.txt` .
The image files of the flowers102 dataset are stored in the `dataset/flowers102/jpg` directory. The image examples are as follows:
<div align="center">
![](../../images/quick_start/Examples-Flower-102.png)
</div>
Return to the root directory of `PaddleClas`:
```shell
# linux or mac
cd ../../
# windoes users can open the PaddleClas root directory
```
<a name="4"></a>
## 4. Model training
<a name="4.1"></a>
### 4.1 Use CPU for model training
Since the CPU is used for model training, the calculation speed is slow, so here is ShuffleNetV2_x0_25 as an example. This model has a small amount of calculation and a faster calculation speed on the CPU. But also because the model is small, the accuracy of the trained model will also be limited.
<a name="4.1.1"></a>
#### 4.1.1 Training without using pre-trained models
```shell
# If you are using the windows operating system, please enter the root directory of PaddleClas in cmd and execute this command:
python tools/train.py -c ./ppcls/configs/quick_start/new_user/ShuffleNetV2_x0_25.yaml
```
- The `-c` parameter is to specify the path of the configuration file for training, and the specific hyperparameters for training can be viewed in the `yaml` file
- The `Global.device` parameter in the `yaml` file is set to `cpu`, that is, the CPU is used for training (if not set, this parameter defaults to `gpu`)
- The `epochs` parameter in the `yaml` file is set to 20, indicating that 20 epoch iterations are performed on the entire data set. It is estimated that the training can be completed in about 20 minutes (different CPUs have slightly different training times). At this time, the training model is not sufficient. To improve the accuracy of the training model, please set this parameter to a larger value, such as **40**, the training time will be extended accordingly
<a name="4.1.2"></a>
#### 4.1.2 Use pre-trained models for training
```shell
python tools/train.py -c ./ppcls/configs/quick_start/new_user/ShuffleNetV2_x0_25.yaml -o Arch.pretrained=True
```
- The `-o` parameter can be set to `True` or `False`, or it can be the storage path of the pre-training model. When `True` is selected, the pre-training weights will be automatically downloaded to the local. Note: If it is a pre-training model path, do not add: `.pdparams`
You can compare whether to use the pre-trained model and observe the drop in loss.
<a name="4.2"></a>
### 4.2 Use GPU for model training
Since GPU training is faster and more complex models can be used, take ResNet50_vd as an example. Compared with ShuffleNetV2_x0_25, this model is more computationally intensive, and the accuracy of the trained model will be higher.
First, you must set the environment variables and use the 0th GPU for training:
- For Linux users:
```shell
export CUDA_VISIBLE_DEVICES=0
```
- For Windows users
```shell
set CUDA_VISIBLE_DEVICES=0
```
<a name="4.2.1"></a>
#### 4.2.1 Training without using pre-trained models
```shell
python tools/train.py -c ./ppcls/configs/quick_start/ResNet50_vd.yaml
```
After the training is completed, the `Top1 Acc` curve of the validation set is shown below, and the highest accuracy rate is 0.2735.
![](../../images/quick_start/r50_vd_acc.png)
<a name="4.2.2"></a>
#### 4.2.2 Use pre-trained models for training
Based on ImageNet1k classification pre-trained model for fine-tuning, the training script is as follows:
```shell
python tools/train.py -c ./ppcls/configs/quick_start/ResNet50_vd.yaml -o Arch.pretrained=True
```
**Note**: This training script uses GPU. If you use CPU, you can modify it as shown in [4.1 Use CPU for model training] (#4.1) above.
The `Top1 Acc` curve of the validation set is shown below. The highest accuracy rate is `0.9402`. After loading the pre-trained model, the accuracy of the flowers102 data set has been greatly improved, and the absolute accuracy has increased by more than 65%.
![](../../images/quick_start/r50_vd_pretrained_acc.png)
<a name="5"></a>
## 5. Model prediction
After the training is completed, the trained model can be used to predict the image category. Take the trained ResNet50_vd model as an example, the prediction code is as follows:
```shell
cd $path_to_PaddleClas
python tools/infer.py -c ./ppcls/configs/quick_start/ResNet50_vd.yaml -o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg -o Global.pretrained_model=output/ResNet50_vd/best_model
```
`-i` indicates the path of a single image. After running successfully, the sample results are as follows:
`[{'class_ids': [76, 51, 37, 33, 9], 'scores': [0.99998, 0.0, 0.0, 0.0, 0.0], 'file_name': 'dataset/flowers102/jpg/image_00001.jpg', 'label_names': ['passion flower', 'wild pansy', 'great masterwort', 'mexican aster', 'globe thistle']}]`
Of course, you can also use the trained ShuffleNetV2_x0_25 model for prediction, the code is as follows:
```shell
cd $path_to_PaddleClas
python tools/infer.py -c ./ppcls/configs/quick_start/new_user/ShuffleNetV2_x0_25.yaml -o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg -o Global.pretrained_model=output/ShuffleNetV2_x0_25/best_model
```
The `-i` parameter can also be the directory of the image file to be tested (`dataset/flowers102/jpg/`). After running successfully, some sample results are as follows:
`[{'class_ids': [76, 51, 37, 33, 9], 'scores': [0.99998, 0.0, 0.0, 0.0, 0.0], 'file_name': 'dataset/flowers102/jpg/image_00001.jpg', 'label_names': ['passion flower', 'wild pansy', 'great masterwort', 'mexican aster', 'globe thistle']}, {'class_ids': [76, 51, 37, 33, 32], 'scores': [0.99999, 0.0, 0.0, 0.0, 0.0], 'file_name': 'dataset/flowers102/jpg/image_00002.jpg', 'label_names': ['passion flower', 'wild pansy', 'great masterwort', 'mexican aster', 'love in the mist']}, {'class_ids': [76, 12, 39, 73, 78], 'scores': [0.99998, 0.0, 0.0, 0.0, 0.0], 'file_name': 'dataset/flowers102/jpg/image_00003.jpg', 'label_names': ['passion flower', 'king protea', 'lenten rose', 'rose', 'toad lily']}, {'class_ids': [76, 37, 34, 12, 9], 'scores': [0.86282, 0.11177, 0.00717, 0.00599, 0.00397], 'file_name': 'dataset/flowers102/jpg/image_00004.jpg', 'label_names': ['passion flower', 'great masterwort', 'alpine sea holly', 'king protea', 'globe thistle']}, {'class_ids': [76, 37, 33, 51, 69], 'scores': [0.9999, 1e-05, 1e-05, 0.0, 0.0], 'file_name': 'dataset/flowers102/jpg/image_00005.jpg', 'label_names': ['passion flower', 'great masterwort', 'mexican aster', 'wild pansy', 'tree poppy']}, {'class_ids': [76, 37, 51, 33, 73], 'scores': [0.99999, 0.0, 0.0, 0.0, 0.0], 'file_name': 'dataset/flowers102/jpg/image_00006.jpg', 'label_names': ['passion flower', 'great masterwort', 'wild pansy', 'mexican aster', 'rose']}, {'class_ids': [76, 37, 12, 91, 30], 'scores': [0.98746, 0.00211, 0.00201, 0.00136, 0.0007], 'file_name': 'dataset/flowers102/jpg/image_00007.jpg', 'label_names': ['passion flower', 'great masterwort', 'king protea', 'bee balm', 'carnation']}, {'class_ids': [76, 37, 81, 77, 72], 'scores': [0.99976, 3e-05, 2e-05, 2e-05, 1e-05], 'file_name': 'dataset/flowers102/jpg/image_00008.jpg', 'label_names': ['passion flower', 'great masterwort', 'clematis', 'lotus', 'water lily']}, {'class_ids': [76, 37, 13, 12, 34], 'scores': [0.99646, 0.00144, 0.00092, 0.00035, 0.00027], 'file_name': 'dataset/flowers102/jpg/image_00009.jpg', 'label_names': ['passion flower', 'great masterwort', 'spear thistle', 'king protea', 'alpine sea holly']}, {'class_ids': [76, 37, 34, 33, 51], 'scores': [0.99999, 0.0, 0.0, 0.0, 0.0], 'file_name': 'dataset/flowers102/jpg/image_00010.jpg', 'label_names': ['passion flower', 'great masterwort', 'alpine sea holly', 'mexican aster', 'wild pansy']}]`
Among them, the length of the list is the size of batch_size.

@ -0,0 +1,306 @@
# Trial in 30mins(professional)
Here is a quick start tutorial for professional users to use PaddleClas on the Linux operating system. The main content is based on the CIFAR-100 data set. You can quickly experience the training of different models, experience loading different pre-trained models, experience the SSLD knowledge distillation solution, and experience data augmentation. Please refer to [Installation Guide](../installation/install_paddleclas_en.md) to configure the operating environment and clone PaddleClas code.
------
## Catalogue
- [1. Data and model preparation](#1)
- [1.1 Data preparation](#1.1)
- [1.1.1 Prepare CIFAR100](#1.1.1)
- [2. Model training](#2)
- [2.1 Single label training](#2.1)
- [2.1.1 Training without loading the pre-trained model](#2.1.1)
- [2.1.2 Transfer learning](#2.1.2)
- [3. Data Augmentation](#3)
- [3.1 Data augmentation-Mixup](#3.1)
- [4. Knowledge distillation](#4)
- [5. Model evaluation and inference](#5)
- [5.1 Single-label classification model evaluation and inference](#5.1)
- [5.1.1 Single-label classification model evaluation](#5.1.1)
- [5.1.2 Single-label classification model prediction](#5.1.2)
- [5.1.3 Single-label classification uses inference model for model inference](#5.1.3)
<a name="1"></a>
## 1. Data and model preparation
<a name="1.1"></a>
### 1.1 Data preparation
* Enter the PaddleClas directory.
```
cd path_to_PaddleClas
```
<a name="1.1.1"></a>
#### 1.1.1 Prepare CIFAR100
* Enter the `dataset/` directory, download and unzip the CIFAR100 dataset.
```shell
cd dataset
wget https://paddle-imagenet-models-name.bj.bcebos.com/data/CIFAR100.tar
tar -xf CIFAR100.tar
cd ../
```
<a name="2"></a>
## 2. Model training
<a name="2.1"></a>
### 2.1 Single label training
<a name="2.1.1"></a>
#### 2.1.1 Training without loading the pre-trained model
* Based on the ResNet50_vd model, the training script is shown below.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.output_dir="output_CIFAR"
```
The highest accuracy of the validation set is around 0.415.
<a name="2.1.2"></a>
#### 2.1.2 Transfer learning
* Based on ImageNet1k classification pre-training model ResNet50_vd_pretrained (accuracy rate 79.12%) for fine-tuning, the training script is shown below.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.output_dir="output_CIFAR" \
-o Arch.pretrained=True
```
The highest accuracy of the validation set is about 0.718. After loading the pre-trained model, the accuracy of the CIFAR100 data set has been greatly improved, with an absolute accuracy increase of 30%.
* Based on ImageNet1k classification pre-training model ResNet50_vd_ssld_pretrained (accuracy rate of 82.39%) for fine-tuning, the training script is shown below.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.output_dir="output_CIFAR" \
-o Arch.pretrained=True \
-o Arch.use_ssld=True
```
In the final CIFAR100 verification set, the top-1 accuracy is 0.73. Compared with the fine-tuning of the pre-trained model with a top-1 accuracy of 79.12%, the top-1 accuracy of the new data set can be increased by 1.2% again.
* Replace the backbone with MobileNetV3_large_x1_0 for fine-tuning, the training script is shown below.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV3_large_x1_0_CIFAR100_finetune.yaml \
-o Global.output_dir="output_CIFAR" \
-o Arch.pretrained=True
```
The highest accuracy of the validation set is about 0.601, which is nearly 12% lower than ResNet50_vd.
<a name="3"></a>
## 3. Data Augmentation
PaddleClas contains many data augmentation methods, such as Mixup, Cutout, RandomErasing, etc. For specific methods, please refer to [Data augmentation chapter](../algorithm_introduction/DataAugmentation_en.md)。
<a name="3.1"></a>
### 3.1 Data augmentation-Mixup
Based on the training method in [Data Augmentation Chapter](../algorithm_introduction/DataAugmentation_en.md) in Section 3.3, combined with Mixup's data augmentation method for training, the specific training script is shown below.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_mixup_CIFAR100_finetune.yaml \
-o Global.output_dir="output_CIFAR"
```
The final accuracy on the CIFAR100 verification set is 0.73, and the use of data augmentation can increase the model accuracy by about 1.2% again.
* **Note**
* For other data augmentation configuration files, please refer to the configuration files in `ppcls/configs/ImageNet/DataAugment/`.
* The number of epochs for training CIFAR100 is small, so the accuracy of the validation set may fluctuate by about 1%.
<a name="4"></a>
## 4. Knowledge distillation
PaddleClas includes a self-developed SSLD knowledge distillation scheme. For specific content, please refer to [Knowledge Distillation Chapter](../algorithm_introduction/knowledge_distillation_en.md). This section will try to use knowledge distillation technology to train the MobileNetV3_large_x1_0 model. Here we use the ResNet50_vd model trained in section 2.1.2 as the teacher model for distillation. First, save the ResNet50_vd model trained in section 2.1.2 to the specified directory. The script is as follows.
```shell
mkdir pretrained
cp -r output_CIFAR/ResNet50_vd/best_model.pdparams ./pretrained/
```
The model name, teacher model and student model configuration, pre-training address configuration, and freeze_params configuration in the configuration file are as follows, where the two values in `freeze_params_list` represent whether the teacher model and the student model freeze parameter training respectively.
```yaml
Arch:
name: "DistillationModel"
# if not null, its lengths should be same as models
pretrained_list:
# if not null, its lengths should be same as models
freeze_params_list:
- True
- False
models:
- Teacher:
name: ResNet50_vd
pretrained: "./pretrained/best_model"
- Student:
name: MobileNetV3_large_x1_0
pretrained: True
```
The loss configuration is as follows, where the training loss is the cross entropy of the output of the student model and the teacher model, and the validation loss is the cross entropy of the output of the student model and the true label.
```yaml
Loss:
Train:
- DistillationCELoss:
weight: 1.0
model_name_pairs:
- ["Student", "Teacher"]
Eval:
- DistillationGTCELoss:
weight: 1.0
model_names: ["Student"]
```
The final training script is shown below.
```shell
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/professional/R50_vd_distill_MV3_large_x1_0_CIFAR100.yaml \
-o Global.output_dir="output_CIFAR"
```
In the end, the accuracy on the CIFAR100 validation set was 64.4%. Using the teacher model for knowledge distillation, the accuracy of MobileNetV3 increased by 4.3%.
* **Note**
* In the distillation process, the pre-trained model used by the teacher model is the training result on the CIFAR100 dataset, and the student model uses the MobileNetV3_large_x1_0 pre-trained model with an accuracy of 75.32% on the ImageNet1k dataset.
* The distillation process does not need to use real labels, so more unlabeled data can be used. In the process of use, you can generate fake `train_list.txt` from unlabeled data, and then merge it with the real `train_list.txt`, You can experience it yourself based on your own data.
<a name="5"></a>
## 5. Model evaluation and inference
<a name="5.1"></a>
### 5.1 Single-label classification model evaluation and inference
<a name="5.1.1"></a>
#### 5.1.1 Single-label classification model evaluation
After training the model, you can use the following commands to evaluate the accuracy of the model.
```bash
python3 tools/eval.py \
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.pretrained_model="output_CIFAR/ResNet50_vd/best_model"
```
<a name="5.1.2"></a>
#### 5.1.2 Single-label classification model prediction
After the model training is completed, the pre-trained model obtained by the training can be loaded for model prediction. A complete example is provided in `tools/infer.py`, the model prediction can be completed by executing the following command:
```python
python3 tools/infer.py \
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Infer.infer_imgs=./dataset/CIFAR100/test/0/0001.png \
-o Global.pretrained_model=output_CIFAR/ResNet50_vd/best_model
```
<a name="5.1.3"></a>
#### 5.1.3 Single-label classification uses inference model for model inference
We need to export the inference model, PaddlePaddle supports the use of prediction engines for inference. Here, we will introduce how to use the prediction engine for inference:
First, export the trained model to inference model:
```bash
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
-o Global.pretrained_model=output_CIFAR/ResNet50_vd/best_model
```
* By default, `inference.pdiparams`, `inference.pdmodel` and `inference.pdiparams.info` files will be generated in the `inference` folder.
Use prediction engines for inference:
Enter the deploy directory:
```bash
cd deploy
```
Change the `inference_cls.yaml` file. Since the resolution used for training CIFAR100 is 32x32, the relevant resolution needs to be changed. The image preprocessing in the final configuration file is as follows:
```yaml
PreProcess:
transform_ops:
- ResizeImage:
resize_short: 36
- CropImage:
size: 32
- NormalizeImage:
scale: 0.00392157
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
```
Execute the command to make predictions. Since the default `class_id_map_file` is the mapping file of the ImageNet dataset, you need to set None here.
```bash
python3 python/predict_cls.py \
-c configs/inference_cls.yaml \
-o Global.infer_imgs=../dataset/CIFAR100/test/0/0001.png \
-o PostProcess.Topk.class_id_map_file=None
```

@ -0,0 +1,117 @@
# Quick Start of Multi-label Classification
Experience the training, evaluation, and prediction of multi-label classification based on the [NUS-WIDE-SCENE](https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html) dataset, which is a subset of the NUS-WIDE dataset. Please first install PaddlePaddle and PaddleClas, see [Paddle Installation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/installation) and [PaddleClas installation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/installation/install_ paddleclas.md) for more details.
## Catalogue
- [1. Data and Model Preparation](#1)
- [2. Model Training](#2)
- [3. Model Evaluation](#3)
- [4. Model Prediction](#4)
- [5. Predictive engine-based Prediction](#5)
- [5.1 Export inference model](#5.1)
- [5.2 Predictive engine-based Prediction](#5.2)
<a name="1"></a>
## 1. Data and Model Preparation
- Go to `PaddleClas`.
```
cd path_to_PaddleClas
```
- Create and go to `dataset/NUS-WIDE-SCENE`, download and unzip the NUS-WIDE-SCENE dataset.
```
mkdir dataset/NUS-WIDE-SCENE
cd dataset/NUS-WIDE-SCENE
wget https://paddle-imagenet-models-name.bj.bcebos.com/data/NUS-SCENE-dataset.tar
tar -xf NUS-SCENE-dataset.tar
```
- Return to `PaddleClas` root directory
```
cd ../../
```
<a name="2"></a>
## 2. Model Training
```
export CUDA_VISIBLE_DEVICES=0,1,2,3
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
```
After training 10 epochs, the best correctness of the validation set should be around 0.95.
<a name="3"></a>
## 3. Model Evaluation
```
python3 tools/eval.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
<a name="4"></a>
## 4. Model Prediction
```
python3 tools/infer.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
Obtain an output silimar to the following:
```
[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]
```
<a name="5"></a>
## 5. Predictive engine-based Prediction
<a name="5.1"></a>
### 5.1 Export inference model
```
python3 tools/export_model.py \
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
-o Arch.pretrained="./output/MobileNetV1/best_model"
```
The path of the inference model is by default under the current path `. /inference`.
<a name="5.2"></a>
### 5.2 Predictive engine-based Prediction
Go to the `deploy` first
```
cd ./deploy
```
Inference and prediction through predictive engines:
```
python3 python/predict_cls.py \
-c configs/inference_multilabel_cls.yaml
```
Obtain an output silimar to the following:
```
0517_2715693311.jpg: class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
```

@ -0,0 +1,296 @@
# Quick Start of Recognition
This tutorial contains 3 parts: Environment Preparation, Image Recognition Experience, and Unknown Category Image Recognition Experience.
If the image category already exists in the image index database, then you can take a reference to chapter [Image Recognition Experience](#2)to complete the progress of image recognitionIf you wish to recognize unknow category image, which is not included in the index databaseyou can take a reference to chapter [Unknown Category Image Recognition Experience](#3)to complete the process of creating an index to recognize it。
## Catalogue
* [1. Enviroment Preparation](#1)
* [2. Image Recognition Experience](#2)
* [2.1 Download and Unzip the Inference Model and Demo Data](#2.1)
* [2.2 Product Recognition and Retrieval](#2.2)
* [2.2.1 Single Image Recognition](#2.2.1)
* [2.2.2 Folder-based Batch Recognition](#2.2.2)
* [3. Unknown Category Image Recognition Experience](#3)
* [3.1 Prepare for the new images and labels](#3.1)
* [3.2 Build a new Index Library](#3.2)
* [3.3 Recognize the Unknown Category Images](#3.3)
<a name="1"></a>
## 1. Enviroment Preparation
* InstallationPlease take a reference to [Quick Installation ](../installation/)to configure the PaddleClas environment.
* Using the following command to enter Folder `deploy`. All content and commands in this section need to be run in folder `deploy`.
```
cd deploy
```
<a name="2"></a>
## 2. Image Recognition Experience
The detection model with the recognition inference model for the 4 directions (Logo, Cartoon Face, Vehicle, Product), the address for downloading the test data and the address of the corresponding configuration file are as follows.
| Models Introduction | Recommended Scenarios | inference Model | Predict Config File | Config File to Build Index Database |
| ------------ | ------------- | -------- | ------- | -------- |
| Generic mainbody detection model | General Scenarios |[Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar) | - | - |
| Logo Recognition Model | Logo Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/logo_rec_ResNet50_Logo3K_v1.0_infer.tar) | [inference_logo.yaml](../../../deploy/configs/inference_logo.yaml) | [build_logo.yaml](../../../deploy/configs/build_logo.yaml) |
| Cartoon Face Recognition Model| Cartoon Face Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/cartoon_rec_ResNet50_iCartoon_v1.0_infer.tar) | [inference_cartoon.yaml](../../../deploy/configs/inference_cartoon.yaml) | [build_cartoon.yaml](../../../deploy/configs/build_cartoon.yaml) |
| Vehicle Fine-Grained Classfication Model | Vehicle Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_cls_ResNet50_CompCars_v1.0_infer.tar) | [inference_vehicle.yaml](../../../deploy/configs/inference_vehicle.yaml) | [build_vehicle.yaml](../../../deploy/configs/build_vehicle.yaml) |
| Product Recignition Model | Product Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
| Vehicle ReID Model | Vehicle ReID Scenario | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/vehicle_reid_ResNet50_VERIWild_v1.0_infer.tar) | - | - |
| Models Introduction | Recommended Scenarios | inference Model | Predict Config File | Config File to Build Index Database |
| ------------ | ------------- | -------- | ------- | -------- |
| Lightweight generic mainbody detection model | General Scenarios |[Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | - | - |
| Lightweight generic recognition model | General Scenarios | [Model Download Link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar) | [inference_product.yaml](../../../deploy/configs/inference_product.yaml) | [build_product.yaml](../../../deploy/configs/build_product.yaml) |
Demo data in this tutorial can be downloaded here: [download link](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar).
**Attention**
1. If you do not have wget installed on Windows, you can download the model by copying the link into your browser and unzipping it in the appropriate folder; for Linux or macOS users, you can right-click and copy the download link to download it via the `wget` command.
2. If you want to install `wget` on macOS, you can run the following command.
3. The predict config file of the lightweight generic recognition model and the config file to build index database are used for the config of product recognition model of server-side. You can modify the path of the model to complete the index building and prediction.
```shell
# install homebrew
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)";
# install wget
brew install wget
```
3. If you want to isntall `wget` on Windows, you can refer to [link](https://www.cnblogs.com/jeshy/p/10518062.html). If you want to install `tar` on Windows, you can refer to [link](https://www.cnblogs.com/chooperman/p/14190107.html).
* You can download and unzip the data and models by following the command below
```shell
mkdir models
cd models
# Download and unzip the inference model
wget {Models download link} && tar -xf {Name of the tar archive}
cd ..
# Download the demo data and unzip
wget {Data download link} && tar -xf {Name of the tar archive}
```
<a name="2.1"></a>
### 2.1 Download and Unzip the Inference Model and Demo Data
Take the product recognition as an example, download the detection model, recognition model and product recognition demo data with the following commands.
```shell
mkdir models
cd models
# Download the generic detection inference model and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
# Download and unpack the inference model
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
cd ..
# Download the demo data and unzip it
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_en_v1.1.tar && tar -xf recognition_demo_data_en_v1.1.tar
```
Once unpacked, the `recognition_demo_data_v1.1` folder should have the following file structure.
```
├── recognition_demo_data_v1.1
│ ├── gallery_cartoon
│ ├── gallery_logo
│ ├── gallery_product
│ ├── gallery_vehicle
│ ├── test_cartoon
│ ├── test_logo
│ ├── test_product
│ └── test_vehicle
├── ...
```
here, original images to build index are in folder `gallery_xxx`, test images are in folder `test_xxx`. You can also access specific folder for more details.
The `models` folder should have the following file structure.
```
├── product_ResNet50_vd_aliproduct_v1.0_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
├── ppyolov2_r50vd_dcn_mainbody_v1.0_infer
│ ├── inference.pdiparams
│ ├── inference.pdiparams.info
│ └── inference.pdmodel
```
**Attention**
If you want to use the lightweight generic recognition model, you need to re-extract the features of the demo data and re-build the index. The way is as follows:
```shell
python3.7 python/build_gallery.py -c configs/build_product.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer
```
<a name="2.2"></a>
### 2.2 Product Recognition and Retrieval
Take the product recognition demo as an example to show the recognition and retrieval process (if you wish to try other scenarios of recognition and retrieval, replace the corresponding configuration file after downloading and unzipping the corresponding demo data and model to complete the prediction).
**Note:** `faiss` is used as search library. The installation method is as follows
```
pip install faiss-cpu==1.7.1post2
```
If error happens when using `import faiss`, please uninstall `faiss` and reinstall it, especially on `Windows`.
<a name="2.2.1"></a>
#### 2.2.1 Single Image Recognition
Run the following command to identify and retrieve the image `./recognition_demo_data_v1.1/test_product/daoxiangcunjinzhubing_6.jpg` for recognition and retrieval
```shell
# use the following command to predict using GPU.
python3.7 python/predict_system.py -c configs/inference_product.yaml
# use the following command to predict using CPU
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.use_gpu=False
```
The image to be retrieved is shown below.
![](../../images/recognition/product_demo/query/daoxiangcunjinzhubing_6.jpg)
The final output is shown below.
```
[{'bbox': [287, 129, 497, 326], 'rec_docs': 'Daoxaingcun Golden Piggie Cake', 'rec_scores': 0.8309420347213745}, {'bbox': [99, 242, 313, 426], 'rec_docs': 'Daoxaingcun Golden Piggie Cake', 'rec_scores': 0.7245651483535767}]
```
where bbox indicates the location of the detected object, rec_docs indicates the labels corresponding to the label in the index dabase that are most similar to the detected object, and rec_scores indicates the corresponding confidence.
The detection result is also saved in the folder `output`, for this image, the visualization result is as follows.
![](../../images/recognition/product_demo/result/daoxiangcunjinzhubing_6_en.jpg)
<a name="2.2.2"></a>
#### 2.2.2 Folder-based Batch Recognition
If you want to predict the images in the folder, you can directly modify the `Global.infer_imgs` field in the configuration file, or you can also modify the corresponding configuration through the following `-o` parameter.
```shell
# using the following command to predict using GPU, you can append `-o Global.use_gpu=False` to predict using CPU.
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./recognition_demo_data_v1.1/test_product/"
```
The results on the screen are shown as following.
```
...
[{'bbox': [37, 29, 123, 89], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.6163763999938965}, {'bbox': [153, 96, 235, 175], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5279821157455444}]
[{'bbox': [735, 562, 1133, 851], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5588355660438538}]
[{'bbox': [124, 50, 230, 129], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.6980369687080383}]
[{'bbox': [0, 0, 275, 183], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5818190574645996}]
[{'bbox': [400, 1179, 905, 1537], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.9814301133155823}, {'bbox': [295, 713, 820, 1046], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.9496176242828369}, {'bbox': [153, 236, 694, 614], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.8395382761955261}]
[{'bbox': [544, 4, 1482, 932], 'rec_docs': 'Chanel Handbag', 'rec_scores': 0.5143815279006958}]
...
```
All the visualization results are also saved in folder `output`.
Furthermore, the recognition inference model path can be changed by modifying the `Global.rec_inference_model_dir` field, and the path of the index to the index databass can be changed by modifying the `IndexProcess.index_dir` field.
<a name="3"></a>
## 3. Recognize Images of Unknown Category
To recognize the image `./recognition_demo_data_v1.1/test_product/anmuxi.jpg`, run the command as follows:
```shell
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./recognition_demo_data_v1.1/test_product/anmuxi.jpg"
```
The image to be retrieved is shown below.
![](../../images/recognition/product_demo/query/anmuxi.jpg)
The output is empty.
Since the index infomation is not included in the corresponding index databse, the recognition result is empty or not proper. At this time, we can complete the image recognition of unknown categories by constructing a new index database.
When the index database cannot cover the scenes we actually recognise, i.e. when predicting images of unknown categories, we need to add similar images of the corresponding categories to the index databasey, thus completing the recognition of images of unknown categories which does not require retraining.
<a name="3.1"></a>
### 3.1 Prepare for the new images and labels
First, you need to copy the images which are similar with the image to retrieval to the original images for the index database. The command is as follows.
```shell
cp -r ../docs/images/recognition/product_demo/gallery/anmuxi ./recognition_demo_data_/gallery_product/gallery/
```
Then you need to create a new label file which records the image path and label information. Use the following command to create a new file based on the original one.
```shell
# copy the file
cp recognition_demo_data_v1.1/gallery_product/data_file.txt recognition_demo_data_v1.1/gallery_product/data_file_update.txt
```
Then add some new lines into the new label file, which is shown as follows.
```
gallery/anmuxi/001.jpg Anmuxi Ambrosial Yogurt
gallery/anmuxi/002.jpg Anmuxi Ambrosial Yogurt
gallery/anmuxi/003.jpg Anmuxi Ambrosial Yogurt
gallery/anmuxi/004.jpg Anmuxi Ambrosial Yogurt
gallery/anmuxi/005.jpg Anmuxi Ambrosial Yogurt
gallery/anmuxi/006.jpg Anmuxi Ambrosial Yogurt
```
Each line can be splited into two fields. The first field denotes the relative image path, and the second field denotes its label. The `delimiter` is `tab` here.
<a name="3.2"></a>
### 3.2 Build a new Index Base Library
Use the following command to build the index to accelerate the retrieval process after recognition.
```shell
python3.7 python/build_gallery.py -c configs/build_product.yaml -o IndexProcess.data_file="./recognition_demo_data_v1.1/gallery_product/data_file_update.txt" -o IndexProcess.index_dir="./recognition_demo_data_v1.1/gallery_product/index_update"
```
Finally, the new index information is stored in the folder`./recognition_demo_data_v1.1/gallery_product/index_update`. Use the new index database for the above index.
<a name="3.3"></a>
### 3.3 Recognize the Unknown Category Images
To recognize the image `./recognition_demo_data_v1.1/test_product/anmuxi.jpg`, run the command as follows.
```shell
# using the following command to predict using GPU, you can append `-o Global.use_gpu=False` to predict using CPU.
python3.7 python/predict_system.py -c configs/inference_product.yaml -o Global.infer_imgs="./recognition_demo_data_v1.1/test_product/anmuxi.jpg" -o IndexProcess.index_dir="./recognition_demo_data_v1.1/gallery_product/index_update"
```
The output is as follows:
```
[{'bbox': [243, 80, 523, 522], 'rec_docs': 'Anmuxi Ambrosial Yogurt', 'rec_scores': 0.5570770502090454}]
```
The final recognition result is `Anmuxi Ambrosial Yogurt`, which is corrrect, the visualization result is as follows.
![](../../images/recognition/product_demo/result/anmuxi_en.jpg)
</div>

Binary file not shown.

After

Width:  |  Height:  |  Size: 209 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 212 KiB

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save