Merge pull request '提交源码' (#14) from develop into master
commit
64b68db666
@ -0,0 +1,20 @@
|
||||
# Minimal makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line, and also
|
||||
# from the environment for the first two.
|
||||
SPHINXOPTS ?=
|
||||
SPHINXBUILD ?= sphinx-build
|
||||
SOURCEDIR = source
|
||||
BUILDDIR = build
|
||||
|
||||
# Put it first so that "make" without argument is like "make help".
|
||||
help:
|
||||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
|
||||
.PHONY: help Makefile
|
||||
|
||||
# Catch-all target: route all unknown targets to Sphinx using the new
|
||||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
|
||||
%: Makefile
|
||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
@ -0,0 +1,20 @@
|
||||
# Minimal makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line, and also
|
||||
# from the environment for the first two.
|
||||
SPHINXOPTS ?=
|
||||
SPHINXBUILD ?= sphinx-build
|
||||
SOURCEDIR = .
|
||||
BUILDDIR = _build
|
||||
|
||||
# Put it first so that "make" without argument is like "make help".
|
||||
help:
|
||||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
||||
|
||||
.PHONY: help Makefile
|
||||
|
||||
# Catch-all target: route all unknown targets to Sphinx using the new
|
||||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
|
||||
%: Makefile
|
||||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
|
@ -0,0 +1,7 @@
|
||||
distillation
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
||||
distillation_en.md
|
@ -0,0 +1,304 @@
|
||||
# How to Contribute to the PaddleClas Community
|
||||
|
||||
------
|
||||
|
||||
## Catalogue
|
||||
|
||||
- [1. How to Contribute Code](#1)
|
||||
- [1.1 Branches of PaddleClas](#1.1)
|
||||
- [1.2 Commit Code to PaddleClas](#1.2)
|
||||
- [1.2.1 Codes of Fork and Clone](#1.2.1)
|
||||
- [1.2.2 Connect to the Remote Repository](#1.2.2)
|
||||
- [1.2.3 Create the Local Branch](#1.2.3)
|
||||
- [1.2.4 Employ Pre-commit Hook](#1.2.4)
|
||||
- [1.2.5 Modify and Commit Code](#1.2.5)
|
||||
- [1.2.6 Keep the Local Repository Updated](#1.2.6)
|
||||
- [1.2.7 Push to Remote Repository](#1.2.7)
|
||||
- [1.2.8 Commit Pull Request](#1.2.8)
|
||||
- [1.2.9 CLA and Unit Test](#1.2.9)
|
||||
- [1.2.10 Delete Branch](#1.2.10)
|
||||
- [1.2.11 Conventions](#1.2.11)
|
||||
- [2. Summary](#2)
|
||||
- [3. Inferences](#3)
|
||||
|
||||
|
||||
<a name="1"></a>
|
||||
## 1. How to Contribute Code
|
||||
|
||||
|
||||
<a name="1.1"></a>
|
||||
### 1.1 Branches of PaddleClas
|
||||
|
||||
PaddleClas maintains the following two branches:
|
||||
|
||||
- release/x.x series: Stable release branches, which are tagged with the release version of Paddle in due course.
|
||||
The latest and the default branch is the release/2.3, which is compatible with Paddle v2.1.0.
|
||||
The branch of release/x.x series will continue to grow with future iteration,
|
||||
and the latest release will be maintained by default, while the former one will fix bugs with no other branches covered.
|
||||
- develop : developing branch, which is adapted to the develop version of Paddle and is mainly used for
|
||||
developing new functions. A good choice for secondary development.
|
||||
To ensure that the develop branch can pull out the release/x.x when needed,
|
||||
only the API that is valid in Paddle's latest release branch can be adopted for its code.
|
||||
In other words, if a new API has been developed in this branch but not yet in the release,
|
||||
please do not use it in PaddleClas. Apart from that, features that do not involve the performance optimizations,
|
||||
parameter adjustments, and policy updates of the API can be developed normally.
|
||||
|
||||
The historical branches of PaddleClas will not be maintained, but will be remained for the existing users.
|
||||
|
||||
- release/static: This branch was used for static graph development and testing,
|
||||
and is currently compatible with >=1.7 versions of Paddle.
|
||||
It is still practicable for the special need of adapting an old version of Paddle,
|
||||
but the code will not be updated except for bug fixing.
|
||||
- dygraph-dev: This branch will no longer be maintained and accept no new code.
|
||||
Please transfer to the develop branch as soon as possible.
|
||||
|
||||
PaddleClas welcomes code contributions to the repo, and the basic process is detailed in the next part.
|
||||
|
||||
|
||||
<a name="1.2"></a>
|
||||
### 1.2 Commit the Code to PaddleClas
|
||||
|
||||
|
||||
<a name="1.2.1"></a>
|
||||
#### 1.2.1 Codes of Fork and Clone
|
||||
|
||||
- Skip to the home page of [PaddleClas GitHub](https://github.com/PaddlePaddle/PaddleClas) and click the
|
||||
Fork button to generate a repository in your own directory, such as `https://github.com/USERNAME/PaddleClas`.
|
||||
|
||||
[](../../images/quick_start/community/001_fork.png)
|
||||
|
||||
- Clone the remote repository to local
|
||||
|
||||
```shell
|
||||
# Pull the code of the develop branch
|
||||
git clone https://github.com/USERNAME/PaddleClas.git -b develop
|
||||
cd PaddleClas
|
||||
```
|
||||
|
||||
Obtain the address below
|
||||
|
||||
[](../../images/quick_start/community/002_clone.png)
|
||||
|
||||
|
||||
<a name="1.2.2"></a>
|
||||
#### 1.2.2 Connect to the Remote Repository
|
||||
|
||||
First check the current information of the remote repository with `git remote -v`.
|
||||
|
||||
```shell
|
||||
origin https://github.com/USERNAME/PaddleClas.git (fetch)
|
||||
origin https://github.com/USERNAME/PaddleClas.git (push)
|
||||
```
|
||||
|
||||
The above information only contains the cloned remote repository,
|
||||
which is the PaddleClas under your username. Then we create a remote host of the original PaddleClas repository named upstream.
|
||||
|
||||
```shell
|
||||
git remote add upstream https://github.com/PaddlePaddle/PaddleClas.git
|
||||
```
|
||||
|
||||
Adopt `git remote -v` to view the current information of the remote repository,
|
||||
and 2 remote repositories including origin and upstream can be found, as shown below.
|
||||
|
||||
```shell
|
||||
origin https://github.com/USERNAME/PaddleClas.git (fetch)
|
||||
origin https://github.com/USERNAME/PaddleClas.git (push)
|
||||
upstream https://github.com/PaddlePaddle/PaddleClas.git (fetch)
|
||||
upstream https://github.com/PaddlePaddle/PaddleClas.git (push)
|
||||
```
|
||||
|
||||
This is mainly to keep the local repository updated when committing a pull request (PR).
|
||||
|
||||
|
||||
<a name="1.2.3"></a>
|
||||
#### 1.2.3 Create the Local Branch
|
||||
|
||||
Run the following command to create a new local branch based on the current one.
|
||||
|
||||
```shell
|
||||
git checkout -b new_branch
|
||||
```
|
||||
|
||||
Or you can create new ones based on remote or upstream branches.
|
||||
|
||||
```shell
|
||||
# Create the new_branch based on the develope of origin (unser remote repository)
|
||||
git checkout -b new_branch origin/develop
|
||||
# Create the new_branch base on the develope of upstream
|
||||
# If you need to create a new branch from upstream,
|
||||
# please first employ git fetch upstream to fetch the upstream code
|
||||
git checkout -b new_branch upstream/develop
|
||||
```
|
||||
|
||||
The following output shows that it has switched to the new branch with :
|
||||
|
||||
```
|
||||
Branch new_branch set up to track remote branch develop from upstream.
|
||||
Switched to a new branch 'new_branch'
|
||||
```
|
||||
|
||||
|
||||
<a name="1.2.4"></a>
|
||||
#### 1.2.4 Employ Pre-commit Hook
|
||||
|
||||
Paddle developers adopt the pre-commit tool to manage Git pre-commit hooks.
|
||||
It helps us format the source code (C++, Python) and automatically check basic issues before committing
|
||||
e.g., one EOL per file, no large files added to Git, etc.
|
||||
|
||||
The pre-commit test is part of the unit tests in Travis-CI,
|
||||
and PRs that do not satisfy the hook cannot be committed to PaddleClas.
|
||||
Please install it first and run it in the current directory:
|
||||
|
||||
```
|
||||
pip install pre-commit
|
||||
pre-commit install
|
||||
```
|
||||
|
||||
- **Note**
|
||||
|
||||
1. Paddle uses clang-format to format C/C++ source code, please make sure `clang-format` has a version of 3.8 or higher.
|
||||
2. `yapf` installed by `pip install pre-commit` and `conda install -c conda-forge pre-commit` is slightly different,
|
||||
and the former one is chosen by PaddleClas developers.
|
||||
|
||||
|
||||
<a name="1.2.5"></a>
|
||||
#### 1.2.5 Modify and Commit Code
|
||||
|
||||
You can check the changed files via `git status`. Follow the steps below to commit the `README.md` of PaddleClas after modification:
|
||||
|
||||
```
|
||||
git add README.md
|
||||
pre-commit
|
||||
```
|
||||
|
||||
Repeat the above steps until the pre-commit format check does not report an error, as shown below.
|
||||
|
||||
[](../../images/quick_start/community/003_precommit_pass.png)
|
||||
|
||||
Run the following command to commit.
|
||||
|
||||
```
|
||||
git commit -m "your commit info"
|
||||
```
|
||||
|
||||
|
||||
<a name="1.2.6"></a>
|
||||
#### 1.2.6 Keep the Local Repository Updated
|
||||
|
||||
Get the latest code for upstream and update the current branch.
|
||||
The upstream here is from the `Connecting to a remote repository` part in section 1.2.
|
||||
|
||||
```
|
||||
git fetch upstream
|
||||
# If you want to commit to another branch, please pull the code from another branch of upstream, in this case it is develop
|
||||
git pull upstream develop
|
||||
```
|
||||
|
||||
|
||||
<a name="1.2.7"></a>
|
||||
#### 1.2.7 Push to Remote Repository
|
||||
|
||||
```
|
||||
git push origin new_branch
|
||||
```
|
||||
|
||||
|
||||
<a name="1.2.8"></a>
|
||||
#### 1.2.8 Commit Pull Request
|
||||
|
||||
Click new pull request and select the local branch and the target branch,
|
||||
as shown in the following figure. In the description of the PR, fill out what the PR accomplishes.
|
||||
Next, wait for the review, and if any changes are required,
|
||||
update the corresponding branch in origin by referring to the above steps.
|
||||
|
||||
[](../../images/quick_start/community/004_create_pr.png)
|
||||
|
||||
|
||||
<a name="1.2.9"></a>
|
||||
#### 1.2.9 CLA and Unit Test
|
||||
|
||||
- When you first commit a Pull Request to PaddlePaddle,
|
||||
you are required to sign a CLA (Contributor License Agreement) to ensure that your code can be merged,
|
||||
please follow the step below to sign CLA:
|
||||
|
||||
1. Please examine the Check section of your PR, find license/cla,
|
||||
and click the detail on the right side to enter the CLA website
|
||||
2. Click `Sign in with GitHub to agree` on the CLA website,
|
||||
and you will be redirected back to your Pull Request page when you are done.
|
||||
|
||||
|
||||
<a name="1.2.10"></a>
|
||||
#### 1.2.10 Delete Branch
|
||||
|
||||
- Delete remote branch
|
||||
|
||||
When the PR is merged into the main repository, you can delete the remote branch from the PR page.
|
||||
|
||||
You can also delete the remote branch using `git push origin :branch name`, e.g.
|
||||
|
||||
```
|
||||
git push origin :new_branch
|
||||
```
|
||||
|
||||
- Delete local branch
|
||||
|
||||
```
|
||||
# Switch to the develop branch, otherwise the current branch cannot be deleted
|
||||
git checkout develop
|
||||
|
||||
# Delete new_branch
|
||||
git branch -D new_branch
|
||||
```
|
||||
|
||||
|
||||
<a name="1.2.11"></a>
|
||||
#### 1.2.11 Conventions
|
||||
|
||||
To help official maintainers focus on the code itself when reviewing it,
|
||||
please adhere to the following conventions each time you commit code:
|
||||
|
||||
1. Please pass the unit test in Travis-CI first.
|
||||
Otherwise, the submitted code may have problems and usually receive no official review.
|
||||
2. Before committing a Pull Request:
|
||||
Note the number of commits.
|
||||
|
||||
Reason: If only one file is modified but more than a dozen commits are committed with a few changes for each,
|
||||
this may overwhelm the reviewer for they need to check each and every commit for specific changes,
|
||||
including the case that the changes between commits overwrite each other.
|
||||
|
||||
Recommendation: Minimize the number of commits each time, and add the last commit with `git commit --amend`.
|
||||
For multiple commits that have been pushed to a remote repository, please refer to
|
||||
[squash commits after push](https://stackoverflow.com/questions/5667884/how-to-squash-commits-in-git-after-they-have-been-pushed).
|
||||
|
||||
Please pay attention to the name of each commit:
|
||||
it should reflect the content of the current commit without being too casual.
|
||||
|
||||
3. If an issue is resolved, please add `fix #issue_number` to the first comment box of the Pull Request,
|
||||
so that the corresponding issue will be closed automatically when the Pull Request is merged. Please choose the appropriate term with keywords such as close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved, please choose the appropriate term. See details in [Closing issues via commit messages](https://help.github.com/articles/closing-issues-via-commit-messages).
|
||||
|
||||
In addition, please stick to the following convention to respond to reviewers' comments:
|
||||
|
||||
1. Every review comment from the official maintainer is expected to be answered,
|
||||
which will better enhance the contribution of the open source community.
|
||||
|
||||
- If you agree with the review and finish the corresponding modification, please simply return Done;
|
||||
- If you disagree with the review, please give your reasons.
|
||||
|
||||
2. If there are plenty of review comments,
|
||||
|
||||
- Please present the revision in general.
|
||||
- Please reply with `start a review` instead of a direct approach, for it may be overwhelming to receive the email of every reply.
|
||||
|
||||
|
||||
<a name="2"></a>
|
||||
## 2. Summary
|
||||
|
||||
- The open source community relies on the contributions and feedback of developers and users.
|
||||
We highly appreciate that and look forward to your valuable comments and Pull Requests to PaddleClas in the hope that together we can build a leading practical and comprehensive code repository for image recognition!
|
||||
|
||||
|
||||
<a name="3"></a>
|
||||
## 3. References
|
||||
|
||||
1. [Guide to PaddlePaddle Local Development](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/08_contribution/index_en.html)
|
||||
2. [Committing PR to Open Source Framework](https://blog.csdn.net/vim_wj/article/details/78300239)
|
@ -0,0 +1,12 @@
|
||||
advanced_tutorials
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
DataAugmentation_en.md
|
||||
distillation/index
|
||||
multilabel/index
|
||||
model_prune_quantization_en.md
|
||||
code_overview_en.md
|
||||
how_to_contribute_en.md
|
@ -0,0 +1,7 @@
|
||||
Multilabel Classification
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 3
|
||||
|
||||
multilabel_en.md
|
@ -0,0 +1,92 @@
|
||||
# Multilabel classification quick start
|
||||
|
||||
Based on the [NUS-WIDE-SCENE](https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html) dataset which is a subset of NUS-WIDE dataset, you can experience multilabel of PaddleClas, include training, evaluation and prediction. Please refer to [Installation](../../installation/) to install at first.
|
||||
|
||||
## Preparation
|
||||
|
||||
* Enter PaddleClas directory
|
||||
|
||||
```
|
||||
cd path_to_PaddleClas
|
||||
```
|
||||
|
||||
* Create and enter `dataset/NUS-WIDE-SCENE` directory, download and decompress NUS-WIDE-SCENE dataset
|
||||
|
||||
```shell
|
||||
mkdir dataset/NUS-WIDE-SCENE
|
||||
cd dataset/NUS-WIDE-SCENE
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/data/NUS-SCENE-dataset.tar
|
||||
tar -xf NUS-SCENE-dataset.tar
|
||||
```
|
||||
|
||||
* Return `PaddleClas` root home
|
||||
|
||||
```
|
||||
cd ../../
|
||||
```
|
||||
|
||||
## Training
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python3 -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml
|
||||
```
|
||||
|
||||
After training for 10 epochs, the best accuracy over the validation set should be around 0.95.
|
||||
|
||||
## Evaluation
|
||||
|
||||
```bash
|
||||
python tools/eval.py \
|
||||
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
|
||||
-o Arch.pretrained="./output/MobileNetV1/best_model"
|
||||
```
|
||||
|
||||
## Prediction
|
||||
|
||||
```bash
|
||||
python3 tools/infer.py
|
||||
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
|
||||
-o Arch.pretrained="./output/MobileNetV1/best_model"
|
||||
```
|
||||
|
||||
You will get multiple output such as the following:
|
||||
```
|
||||
[{'class_ids': [6, 13, 17, 23, 26, 30], 'scores': [0.95683, 0.5567, 0.55211, 0.99088, 0.5943, 0.78767], 'file_name': './deploy/images/0517_2715693311.jpg', 'label_names': []}]
|
||||
```
|
||||
|
||||
## Prediction based on prediction engine
|
||||
|
||||
### Export model
|
||||
|
||||
```bash
|
||||
python3 tools/export_model.py \
|
||||
-c ./ppcls/configs/quick_start/professional/MobileNetV1_multilabel.yaml \
|
||||
-o Arch.pretrained="./output/MobileNetV1/best_model"
|
||||
```
|
||||
|
||||
The default path of the inference model is under the current path `./inference`
|
||||
|
||||
### Prediction based on prediction engine
|
||||
|
||||
Enter the deploy directory:
|
||||
|
||||
```bash
|
||||
cd ./deploy
|
||||
```
|
||||
|
||||
Prediction based on prediction engine:
|
||||
|
||||
```
|
||||
python3 python/predict_cls.py \
|
||||
-c configs/inference_multilabel_cls.yaml
|
||||
```
|
||||
|
||||
You will get multiple output such as the following:
|
||||
|
||||
```
|
||||
0517_2715693311.jpg: class id(s): [6, 13, 17, 23, 26, 30], score(s): [0.96, 0.56, 0.55, 0.99, 0.59, 0.79], label_name(s): []
|
||||
```
|
@ -0,0 +1,12 @@
|
||||
algorithm_introduction
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
image_classification_en.md
|
||||
metric_learning_en.md
|
||||
knowledge_distillation_en.md
|
||||
model_prune_quantization_en.md
|
||||
ImageNet_models_en.md
|
||||
DataAugmentation_en.md
|
@ -0,0 +1,94 @@
|
||||
# Knowledge Distillation
|
||||
|
||||
---
|
||||
## Content
|
||||
|
||||
* [1. Introduction of model compression methods](#1)
|
||||
* [2. Application of knowledge distillation](#2)
|
||||
* [3. Overview of knowledge distillation methods](#3)
|
||||
* [3.1 Response based distillation](#3.1)
|
||||
* [3.2 Feature based distillation](#3.2)
|
||||
* [3.3 Relation based distillation](#3.3)
|
||||
* [4. Reference](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
|
||||
## 1. Introduction of model compression methods
|
||||
|
||||
In recent years, deep neural networks have been proved to be an extremely effective method for solving problems in the fields of computer vision and natural language processing. A suitable neural network architecture might performs better than traditional algorithms mostly.
|
||||
|
||||
When the amount of data is large enough, increasing the model parameters with a reasonable method can significantly improve the model performance, but this brings about the problem of a sharp increase of the model complexity. It costs more for larger models.
|
||||
|
||||
Parameter redundancy exists in deep neural networks generally. At present, there are several mainstream methods to compress the model and reduce parameters. Such as pruning, quantization, knowledge distillation, etc. Knowledge distillation refers to the use of a teacher model to guide the student model to learn specific tasks to ensure that the small model obtains relatively large performance, and even has comparable performance with the large model [1].
|
||||
|
||||
|
||||
Currently, knowledge distillation methods can be roughly divided into the following three types.
|
||||
|
||||
* Response based distillation: Output of student model is guided by the teacher model for
|
||||
* Feature based distillation: Inner feature map of student model is guided by the teacher model.
|
||||
* Relation based distillation: For different samples, the teacher model and the student model are used to calculate the correlation of the feature map between the samples, the final goal is to make sure that correlation matrix of student model and the teacher model are as consistent as possible.
|
||||
|
||||
|
||||
<a name='2'></a>
|
||||
|
||||
## 2. Application of knowledge distillation
|
||||
|
||||
Knowledge distillation algorithm is widely used in lightweight tasks. For tasks that need to meet specific accuracy, by using the knowledge distillation method, we can achieve the required accuracy with a smaller model, thereby reducing model deployment cost.
|
||||
|
||||
|
||||
What's more, for the same model structure, pre-trained models obtained by knowledge distillation often performs better, and these pre-trained models can also improve performance of the downstream tasks. For example, a pre-trained image classification model with higher accuracy can also help other tasks obtain significant accuracy gains such as target detection, image segmentation, OCR, and video classification.
|
||||
|
||||
<a name='3'></a>
|
||||
|
||||
## 3. Overview of knowledge distillation methods
|
||||
|
||||
<a name='3.1'></a>
|
||||
|
||||
### 3.1 Response based distillation
|
||||
|
||||
|
||||
Knowledge distillation algorithm is firstly proposed by Hinton, which is called KD. In addition to base cross entropy loss, KL divergence loss between output of student model and teacher model is also added into the total training loss. It's noted that a larger teacher model is needed to guide the training process of the student model.
|
||||
|
||||
PaddleClas proposed a simple but useful knowledge distillation algorithm canlled SSLD [6], Labels are not needed for SSLD, so unlabeled data can also be used for training. Accuracy of 15 models has more than 3% improvement using SSLD.
|
||||
|
||||
Teacher model is needed for the above-mentioned distillation method to guide the student model training process. Deep Mutual Learning (DML) is then proposed [7], for which two models with same architecture learn from each other to obtain higher accuracy. Compared with KD and other knowledge distillation algorithms that rely on large teacher models, DML is free of dependence on large teacher models. The distillation training process is simpler.
|
||||
|
||||
<a name='3.2'></a>
|
||||
|
||||
### 3.2 Feature based distillation
|
||||
|
||||
Heo et al. proposed OverHaul [8], which calculates the feature map distance between the student model and the teacher model, as distillation loss. Here, feature map alignment of the student model and the teacher model is used to ensure that the feature maps' distance can be calculated.
|
||||
|
||||
Feature based distillation can also be integrated with the response based knowledge distillation algorithm in Chapter 3.1, which means both the inner feature map and output of the student model are guided during the training process. For the DML method, this integration process is simpler, because the alignment process is not needed since the two models' architectures are absolutely same. This integration process is used in the PP-OCRv2 system, which ultimately greatly improves the accuracy of the OCR text recognition model.
|
||||
|
||||
<a name='3.3'></a>
|
||||
|
||||
### 3.3 Relation based distillation
|
||||
|
||||
The papers in chapters `3.1` and `3.2` mainly consider the inner feature map or final output of the student model and the teacher model. These knowledge distillation algorithms only focus on the output for single sample, but do not consider the output relationship between different samples.
|
||||
|
||||
Park et al. proposed RKD [10], a relationship-based knowledge distillation algorithm. In RKD, the relationship between different samples is further considered, and two loss functions are used, which are the second-order distance loss (distance-wise) and the third-order angle loss (angle-wise). For the final distillation loss, KD loss and RKD loss are considered at the same time. The final accuracy is better than the accuracy of the model obtained just using KD loss.
|
||||
|
||||
<a name='4'></a>
|
||||
|
||||
## 4. Reference
|
||||
|
||||
[1] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
|
||||
|
||||
[2] Bagherinezhad H, Horton M, Rastegari M, et al. Label refinery: Improving imagenet classification through label progression[J]. arXiv preprint arXiv:1805.02641, 2018.
|
||||
|
||||
[3] Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
|
||||
|
||||
[4] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation strategies from data[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2019: 113-123.
|
||||
|
||||
[5] Touvron H, Vedaldi A, Douze M, et al. Fixing the train-test resolution discrepancy[C]//Advances in Neural Information Processing Systems. 2019: 8250-8260.
|
||||
|
||||
[6] Cui C, Guo R, Du Y, et al. Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones[J]. arXiv preprint arXiv:2103.05959, 2021.
|
||||
|
||||
[7] Zhang Y, Xiang T, Hospedales T M, et al. Deep mutual learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4320-4328.
|
||||
|
||||
[8] Heo B, Kim J, Yun S, et al. A comprehensive overhaul of feature distillation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1921-1930.
|
||||
|
||||
[9] Du Y, Li C, Guo R, et al. PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System[J]. arXiv preprint arXiv:2109.03144, 2021.
|
||||
|
||||
[10] Park W, Kim D, Lu Y, et al. Relational knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3967-3976.
|
@ -0,0 +1,40 @@
|
||||
# Metric Learning
|
||||
|
||||
## Catalogue
|
||||
|
||||
- [1.Introduction](#1)
|
||||
- [2.Applications](#2)
|
||||
- [3.Algorithms](#3)
|
||||
- [3.1 Classification based](#3.1)
|
||||
- [3.2 Pairwise based](#3.2)
|
||||
|
||||
<a name="1"></a>
|
||||
## 1.Introduction
|
||||
|
||||
Measuring the distance between data is a common practice in machine learning. Generally speaking, Euclidean Distance, Inner Product, or Cosine Similarity are all available to calculate measurable data. However, the same operation can hardly be replicated on unstructured data, such as calculating the compatibility between a video and a piece of music. Despite the difficulty in performing the aforementioned vector operation directly due to varied data formats, priori knowledge tells that ED(laugh_video, laugh_music) < ED(laugh_video, blue_music). And how to effectively characterize this "distance"? This is exactly the focus of Metric Learning.
|
||||
|
||||
Metric learning, known as Distance Metric Learning, is to automatically construct a task-specific metric function based on training data in the form of machine learning. As shown in the figure below, the goal of Metric learning is to learn a transformation function (either linear or nonlinear) L that maps data points from the original vector space to a new one in which similar points are closer together and non-similar points are further apart, making the metric more task-appropriate. And Deep Metric Learning fits the transformation function by adopting a deep neural network. 
|
||||
|
||||
<a name="2"></a>
|
||||
## 2.Applications
|
||||
|
||||
Metric Learning technologies are widely applied in real life, such as Face Recognition, Person ReID, Image Retrieval, Fine-grained classification, etc. With the growing prevalence of deep learning in industrial practice, Deep Metric Learning (DML) emerges as the current research direction.
|
||||
|
||||
Normally, DML consists of three parts: a feature extraction network for map embedding, a sampling strategy to combine samples in a mini-batch into multiple sub-sets, and a loss function to compute the loss on each sub-set. Please refer to the figure below: 
|
||||
|
||||
<a name="3"></a>
|
||||
## 3.Algorithms
|
||||
|
||||
Two learning paradigms are adopted in Metric Learning:
|
||||
|
||||
<a name="3.1"></a>
|
||||
### 3.1 Classification based:
|
||||
|
||||
This refers to methods based on classification labels. They learn the effective feature representation by classifying each sample into the correct category and require the participation of the explicit labels of each sample in the Loss calculation during the learning process. Common algorithms include [L2-Softmax](https://arxiv.org/abs/1703.09507), [Large-margin Softmax](https://arxiv.org/abs/1612.02295), [Angular Softmax](https://arxiv.org/pdf/1704.08063.pdf), [NormFace](https://arxiv.org/abs/1704.06369), [AM-Softmax](https://arxiv.org/abs/1801.05599), [CosFace](https://arxiv.org/abs/1801.09414), [ArcFace](https://arxiv.org/abs/1801.07698), etc. These methods are also called proxy-based, because what they optimize is essentially the similarity between a sample and a set of proxies.
|
||||
|
||||
<a name="3.2"></a>
|
||||
### 3.2 Pairwise based:
|
||||
|
||||
This refers to the learning paradigm based on paired samples. It takes sample pairs as input and obtains an effective feature representation by directly learning the similarity between these pairs. Common algorithms include [Contrastive loss](http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf), [ Triplet loss](https://arxiv.org/abs/1503.03832), [Lifted-Structure loss](https://arxiv.org/abs/1511.06452), [N-pair loss](https://), [Multi-Similarity loss](https://arxiv.org/pdf/1904.06627.pdf), etc.
|
||||
|
||||
[CircleLoss](https://arxiv.org/abs/2002.10857), released in 2020, unifies the two learning paradigms from a fresh perspective, prompting researchers and practitioners' further reflection on Metric Learning.
|
@ -0,0 +1,3 @@
|
||||
# Release Notes
|
||||
|
||||
* 2020.04.14: first commit
|
@ -0,0 +1,65 @@
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# This file only contains a selection of the most common options. For a full
|
||||
# list see the documentation:
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
# -- Path setup --------------------------------------------------------------
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#
|
||||
# import os
|
||||
# import sys
|
||||
# sys.path.insert(0, os.path.abspath('.'))
|
||||
import sphinx_rtd_theme
|
||||
from recommonmark.parser import CommonMarkParser
|
||||
# -- Project information -----------------------------------------------------
|
||||
|
||||
project = 'PaddleClas-en'
|
||||
copyright = '2022, PaddleClas'
|
||||
author = 'PaddleClas'
|
||||
|
||||
# The full version, including alpha/beta/rc tags
|
||||
release = '2.3'
|
||||
|
||||
|
||||
# -- General configuration ---------------------------------------------------
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be
|
||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||
# ones.
|
||||
source_parsers = {
|
||||
'.md': CommonMarkParser,
|
||||
}
|
||||
source_suffix = ['.rst', '.md']
|
||||
extensions = [
|
||||
'recommonmark',
|
||||
'sphinx_markdown_tables'
|
||||
]
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
||||
# The root document.
|
||||
root_doc = 'doc_en'
|
||||
|
||||
# List of patterns, relative to source directory, that match files and
|
||||
# directories to ignore when looking for source files.
|
||||
# This pattern also affects html_static_path and html_extra_path.
|
||||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
|
||||
|
||||
|
||||
# -- Options for HTML output -------------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
#
|
||||
# 更改文档配色
|
||||
html_theme = "sphinx_rtd_theme"
|
||||
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
html_static_path = ['_static']
|
@ -0,0 +1,8 @@
|
||||
data_preparation
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
recognition_dataset_en.md
|
||||
classification_dataset_en.md
|
@ -0,0 +1,10 @@
|
||||
faq_series
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
faq_2021_s2_en.md
|
||||
faq_2021_s1_en.md
|
||||
faq_2020_s1_en.md
|
||||
faq_selected_30_en.md
|
@ -0,0 +1,9 @@
|
||||
image_recognition_pipeline
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
mainbody_detection_en.md
|
||||
feature_extraction_en.md
|
||||
vector_search_en.md
|
@ -0,0 +1,120 @@
|
||||
# Vector Search
|
||||
|
||||
Vector search finds wide applications in image recognition and image retrieval. It aims to obtain the similarity ranking for a given query vector by performing a similarity or distance calculation of feature vectors with all the vectors to be queried in an established vector library. In the image recognition system, [Faiss](https://github.com/facebookresearch/faiss) is adopted for corresponding support, please check [the official website of Faiss](https://github.com/facebookresearch/faiss) for more details. The main advantages of `Faiss` can be generalized as the following:
|
||||
|
||||
- Great adaptability: support Windows, Linux, and MacOS systems
|
||||
- Easy installation: support `python` interface and direct installation with `pip`
|
||||
- Rich algorithms: support a variety of search algorithms to cover different scenarios
|
||||
- Support both CPU and GPU, which accelerates the search process
|
||||
|
||||
It is worth noting that the current version of `PaddleClas` **only uses CPU for vector retrieval** for the moment in pursuit of better adaptability.
|
||||
|
||||
[](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/structure.jpg)
|
||||
|
||||
As shown in the figure above, two parts constitute the vector search in the whole `PP-ShiTu` system.
|
||||
|
||||
- The green part: the establishment of search libraries for the search query, while providing functions such as adding and deleting images.
|
||||
- The blue part: the search function, i.e., given the feature vector of a picture and return the label of similar images in the library.
|
||||
|
||||
This document mainly introduces the installation of the search module in PaddleClas, the adopted search algorithms, the library building process, and the parameters in the relevant configuration files.
|
||||
|
||||
------
|
||||
|
||||
## Catalogue
|
||||
|
||||
- [1. Installation of the Search Library](#1)
|
||||
- [2. Search Algorithms](#2)
|
||||
- [3. Introduction of and Configuration Files](#3)
|
||||
- [3.1 Parameters of Library Building and Configuration Files](#3.1)
|
||||
- [3.2 Parameters of Search Configuration Files](#3.2)
|
||||
|
||||
<a name="1"></a>
|
||||
|
||||
## 1. Installation of the Search Library
|
||||
|
||||
`Faiss` can be installed as follows:
|
||||
|
||||
```
|
||||
pip install faiss-cpu==1.7.1post2
|
||||
```
|
||||
|
||||
If the above cannot be properly used, please `uninstall` and then `install` again, especially when you are using`windows`.
|
||||
|
||||
<a name="2"></a>
|
||||
|
||||
## 2. Search Algorithms
|
||||
|
||||
Currently, the search module in `PaddleClas` supports the following three search algorithms:
|
||||
|
||||
- **HNSW32**: A graph indexing method boasts high retrieval accuracy and fast speed. However, the feature library only supports the function of adding images, not deleting image features. (Default method)
|
||||
- **IVF**: An inverted index search method with fast speed but slightly lower precision. The feature library supports functions of adding and deleting image features.
|
||||
- **FLAT**: A violent search algorithm presenting the highest precision, but slower retrieval speed in face of large data volume. The feature library supports functions of adding and deleting image features.
|
||||
|
||||
Each search algorithm can find its right place in different scenarios. `HNSW32`, as the default method, strikes a balance between accuracy and speed, see its detailed introduction in the [official document](https://github.com/facebookresearch/faiss/wiki).
|
||||
|
||||
<a name="3"></a>
|
||||
|
||||
## 3. Introduction of Configuration Files
|
||||
|
||||
Configuration files involving the search module are under `deploy/configs/`, where `build_*.yaml` is related to building the feature library, and `inference_*.yaml` is the inference file for retrieval or classification.
|
||||
|
||||
<a name="3.1"></a>
|
||||
|
||||
### 3.1 Parameters of Library Building and Configuration Files
|
||||
|
||||
The building of the library is detailed as follows:
|
||||
|
||||
```
|
||||
# Enter deploy directory
|
||||
cd deploy
|
||||
# Change the yaml file to the specific one you need
|
||||
python python/build_gallery.py -c configs/build_***.yaml
|
||||
```
|
||||
|
||||
The `yaml` file is configured as follows for library building, please make necessary corrections to fit the real operation. The construction will extract the features of the images under `image_root` according to the image list in `data_file` and store them under `index_dir` for subsequent search.
|
||||
|
||||
The `data_file` stores the path and label of the image file, with each line presenting the format `image_path label`. The intervals are spaced by the `delimiter` parameter in the `yaml` file.
|
||||
|
||||
The specific model parameters for feature extraction can be found in the `yaml` file.
|
||||
|
||||
```
|
||||
# indexing engine config
|
||||
IndexProcess:
|
||||
index_method: "HNSW32" # supported: HNSW32, IVF, Flat
|
||||
index_dir: "./recognition_demo_data_v1.1/gallery_product/index"
|
||||
image_root: "./recognition_demo_data_v1.1/gallery_product/"
|
||||
data_file: "./recognition_demo_data_v1.1/gallery_product/data_file.txt"
|
||||
index_operation: "new" # suported: "append", "remove", "new"
|
||||
delimiter: "\t"
|
||||
dist_type: "IP"
|
||||
embedding_size: 512
|
||||
```
|
||||
|
||||
- **index_method**: the search algorithm. It currently supports three, HNSW32, IVF, and Flat.
|
||||
- **index_dir**: the folder where the built feature library is stored.
|
||||
- **image_root**: the location of the folder where the annotated images needed to build the feature library are stored.
|
||||
- **data_file**: the data list of the annotated images needed to build the feature library, the format of each line: relative_path label.
|
||||
- **index_operation**: the operation to build a library: `new` for initiating an operation, `append` for adding the image feature of data_file to the feature library, `remove` for deleting the image of data_file from the feature library.
|
||||
- **delimiter**: delimiter for each line in **data_file**
|
||||
- **dist_type**: the method of similarity calculation adopted in feature matching. For example, Inner Product(`IP`) and Euclidean distance(`L2`).
|
||||
- **embedding_size**: feature dimensionality
|
||||
|
||||
<a name="3.2"></a>
|
||||
|
||||
### 3.2 Parameters of Search Configuration Files
|
||||
|
||||
To integrate the search into the overall `PP-ShiTu` process, please refer to `The Introduction of PP-ShiTu Image Recognition System` in [README](../../../README_en.md). Please check the [Quick Start for Image Recognition](../quick_start/quick_start_recognition_en.md) for the specific operation of the search.
|
||||
|
||||
The search part is configured as follows. Please refer to `deploy/configs/inference_*.yaml` for the complete version.
|
||||
|
||||
```
|
||||
IndexProcess:
|
||||
index_dir: "./recognition_demo_data_v1.1/gallery_logo/index/"
|
||||
return_k: 5
|
||||
score_thres: 0.5
|
||||
```
|
||||
|
||||
The following are new parameters other than those of the library building configuration file:
|
||||
|
||||
- `return_k`: `k` results are returned
|
||||
- `score_thres`: the threshold for retrieval and match
|
@ -0,0 +1,19 @@
|
||||
inference_deployment
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
export_model_en.md
|
||||
python_deploy_en.md
|
||||
cpp_deploy_en.md
|
||||
paddle_serving_deploy_en.md
|
||||
paddle_hub_serving_deploy_en.md
|
||||
paddle_lite_deploy_en.md
|
||||
whl_deploy_en.md
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
@ -0,0 +1,142 @@
|
||||
# Infering based on Python prediction engine
|
||||
|
||||
The inference model (the model saved by `paddle.jit.save`) is generally a solidified model saved after the model training is completed, and is mostly used to give prediction in deployment.
|
||||
|
||||
The model saved during the training process is the checkpoints model, which saves the parameters of the model and is mostly used to resume training.
|
||||
|
||||
Compared with the checkpoints model, the inference model will additionally save the structural information of the model. Therefore, it is easier to deploy because the model structure and model parameters are already solidified in the inference model file, and is suitable for integration with actual systems.
|
||||
|
||||
Please refer to the document [install paddle](../installation/install_paddle_en.md) and [install paddleclas](../installation/install_paddleclas_en.md) to prepare the environment.
|
||||
|
||||
---
|
||||
|
||||
## Catalogue
|
||||
|
||||
- [1. Image classification inference](#1)
|
||||
- [2. Mainbody detection model inference](#2)
|
||||
- [3. Feature Extraction model inference](#3)
|
||||
- [4. Concatenation of mainbody detection, feature extraction and vector search](#4)
|
||||
|
||||
|
||||
<a name="1"></a>
|
||||
## 1. Image classification inference
|
||||
|
||||
First, please refer to the document [export model](./export_model_en.md) to prepare the inference model files. All the command should be run under `deploy` folder of PaddleClas:
|
||||
|
||||
```shell
|
||||
cd deploy
|
||||
```
|
||||
|
||||
For classification model inference, you can execute the following commands:
|
||||
|
||||
```shell
|
||||
python python/predict_cls.py -c configs/inference_cls.yaml
|
||||
```
|
||||
|
||||
In the configuration file `configs/inference_cls.yaml`, the following fields are used to configure prediction parameters:
|
||||
* `Global.infer_imgs`: The path of image to be predicted;
|
||||
* `Global.inference_model_dir`: The directory of inference model files. There should be contain the model files (`inference.pdmodel` and `inference.pdiparams`);
|
||||
* `Global.use_tensorrt`: Whether use `TensorRT`, `False` by default;
|
||||
* `Global.use_gpu`: Whether use GPU, `True` by default;
|
||||
* `Global.enable_mkldnn`: Whether use `MKL-DNN`, `False` by default. Valid only when `use_gpu` is `False`;
|
||||
* `Global.use_fp16`: Whether use `FP16`, `False` by default;
|
||||
* `PreProcess`: To config the preprocessing of image to be predicted;
|
||||
* `PostProcess`: To config the postprocessing of prediction results;
|
||||
* `PostProcess.Topk.class_id_map_file`: The path of file mapping label and class id. By default ImageNet1k (`./utils/imagenet1k_label_list.txt`).
|
||||
|
||||
**Notice**:
|
||||
* If VisionTransformer series models used, such as `DeiT_***_384`, `ViT_***_384`, please notice the size of model input. And you could need to specify the `PreProcess.resize_short=384`, `PreProcess.resize=384`.
|
||||
* If you want to improve the speed of the evaluation, it is recommended to enable TensorRT when using GPU, and MKL-DNN when using CPU.
|
||||
|
||||
```shell
|
||||
python python/predict_cls.py -c configs/inference_cls.yaml -o Global.infer_imgs=images/ILSVRC2012_val_00010010.jpeg
|
||||
```
|
||||
|
||||
If you want to use the CPU for prediction, you can switch value of `use_gpu` in config file to `False`. Or you can execute the command as follows
|
||||
```
|
||||
python python/predict_cls.py -c configs/inference_cls.yaml -o Global.use_gpu=False
|
||||
```
|
||||
|
||||
<a name="2"></a>
|
||||
## 2. Mainbody detection model inference
|
||||
|
||||
The following will introduce the mainbody detection model inference. All the command should be run under `deploy` folder of PaddleClas:
|
||||
|
||||
```shell
|
||||
cd deploy
|
||||
```
|
||||
|
||||
For mainbody detection model inference, you can execute the following commands:
|
||||
|
||||
```shell
|
||||
mkdir -p models
|
||||
cd models
|
||||
# download mainbody detection inference model
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar && tar -xf ppyolov2_r50vd_dcn_mainbody_v1.0_infer.tar
|
||||
cd ..
|
||||
# predict
|
||||
python python/predict_det.py -c configs/inference_det.yaml
|
||||
```
|
||||
|
||||
The input example image is as follows:
|
||||
[](../images/recognition/product_demo/wangzai.jpg)
|
||||
|
||||
The output will be:
|
||||
|
||||
```text
|
||||
[{'class_id': 0, 'score': 0.4762245, 'bbox': array([305.55115, 226.05322, 776.61084, 930.42395], dtype=float32), 'label_name': 'foreground'}]
|
||||
```
|
||||
|
||||
And the visualise result is as follows:
|
||||
[](../images/recognition/product_demo/wangzai_det_result.jpg)
|
||||
|
||||
If you want to detect another image, you can change the value of `infer_imgs` in `configs/inference_det.yaml`,
|
||||
or you can use `-o Global.infer_imgs` argument. For example, if you want to detect `images/anmuxi.jpg`:
|
||||
|
||||
```shell
|
||||
python python/predict_det.py -c configs/inference_det.yaml -o Global.infer_imgs=images/anmuxi.jpg
|
||||
```
|
||||
|
||||
If you want to use the CPU for prediction, you can switch value of `use_gpu` in config file to `False`. Or you can execute the command as follows
|
||||
```
|
||||
python python/predict_det.py -c configs/inference_det.yaml -o Global.use_gpu=False
|
||||
```
|
||||
|
||||
<a name="3"></a>
|
||||
## 3. Feature Extraction model inference
|
||||
|
||||
First, please refer to the document [export model](./export_model_en.md) to prepare the inference model files. All the command should be run under `deploy` folder of PaddleClas:
|
||||
|
||||
```shell
|
||||
cd deploy
|
||||
```
|
||||
|
||||
For feature extraction model inference, you can execute the following commands:
|
||||
|
||||
```shell
|
||||
mkdir -p models
|
||||
cd models
|
||||
# download feature extraction inference model
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar && tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
|
||||
cd ..
|
||||
# predict
|
||||
python python/predict_rec.py -c configs/inference_rec.yaml
|
||||
```
|
||||
You can get a 512-dim feature printed in the command line.
|
||||
|
||||
If you want to extract feature of another image, you can change the value of `infer_imgs` in `configs/inference_rec.yaml`,
|
||||
or you can use `-o Global.infer_imgs` argument. For example, if you want to try `images/anmuxi.jpg`:
|
||||
|
||||
```shell
|
||||
python python/predict_rec.py -c configs/inference_rec.yaml -o Global.infer_imgs=images/anmuxi.jpg
|
||||
```
|
||||
|
||||
If you want to use the CPU for prediction, you can switch value of `use_gpu` in config file to `False`. Or you can execute the command as follows
|
||||
|
||||
```
|
||||
python python/predict_rec.py -c configs/inference_rec.yaml -o Global.use_gpu=False
|
||||
```
|
||||
|
||||
<a name="4"></a>
|
||||
## 4. Concatenation of mainbody detection, feature extraction and vector search
|
||||
Please refer to [Quick Start of Recognition](../quick_start/quick_start_recognition_en.md)
|
@ -0,0 +1,8 @@
|
||||
installation
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
install_paddle_en.md
|
||||
install_paddleclas_en.md
|
@ -0,0 +1,32 @@
|
||||
# Install PaddleClas
|
||||
|
||||
---
|
||||
|
||||
## Catalogue
|
||||
|
||||
* [1. Clone PaddleClas source code](#1)
|
||||
* [2. Install requirements](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
|
||||
### 1. Clone PaddleClas source code
|
||||
|
||||
```shell
|
||||
git clone https://github.com/PaddlePaddle/PaddleClas.git -b develop
|
||||
```
|
||||
|
||||
If it is too slow for you to download from github, you can download PaddleClas from gitee. The command is as follows.
|
||||
|
||||
```shell
|
||||
git clone https://gitee.com/paddlepaddle/PaddleClas.git -b develop
|
||||
```
|
||||
|
||||
<a name='2'></a>
|
||||
|
||||
## 2. Install requirements
|
||||
|
||||
PaddleClas dependencies are listed in file `requirements.txt`, you can use the following command to install the dependencies.
|
||||
|
||||
```
|
||||
pip install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple
|
||||
```
|
@ -0,0 +1,13 @@
|
||||
## Features of PaddleClas
|
||||
|
||||
PaddleClas is an image recognition toolset for industry and academia, helping users train better computer vision models and apply them in real scenarios. Specifically, it contains the following core features.
|
||||
|
||||
- Practical image recognition system: Integrate detection, feature learning, and retrieval modules to be applicable to all types of image recognition tasks. Four sample solutions are provided, including product recognition, vehicle recognition, logo recognition, and animation character recognition.
|
||||
- Rich library of pre-trained models: Provide a total of 175 ImageNet pre-trained models of 36 series, among which 7 selected series of models support fast structural modification.
|
||||
- Comprehensive and easy-to-use feature learning components: 12 metric learning methods are integrated and can be combined and switched at will through configuration files.
|
||||
- SSLD knowledge distillation: The 14 classification pre-training models generally improved their accuracy by more than 3%; among them, the ResNet50_vd model achieved a Top-1 accuracy of 84.0% on the Image-Net-1k dataset and the Res2Net200_vd pre-training model achieved a Top-1 accuracy of 85.1%.
|
||||
- Data augmentation: Provide 8 data augmentation algorithms such as AutoAugment, Cutout, Cutmix, etc. with the detailed introduction, code replication, and evaluation of effectiveness in a unified experimental environment.
|
||||
|
||||

|
||||
|
||||
For more information about the quick start of image recognition, algorithm details, model training and evaluation, and prediction and deployment methods, please refer to the [README Tutorial](../../../README_ch.md) on home page.
|
@ -0,0 +1,8 @@
|
||||
introduction
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
function_intro_en.md
|
||||
more_demo/index
|
@ -0,0 +1,53 @@
|
||||
# Cartoon Demo Images
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069080-a821e0b7-8a10-4946-bf05-ff093cc16064.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069100-7539d292-1bd8-4655-8a6d-d1f2238bd618.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069103-f91359d4-1197-4a6e-b2f7-434c76a6b704.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069108-ad54ae1d-610d-4cfa-9cd6-8ee8d280d61d.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069114-3c771434-84a8-4e58-961e-d35edfbfe5ef.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069119-e8d85be5-da87-4125-ae8b-9fd4cac139d9.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069124-98c30894-4837-4f2f-8399-3d3ebadfd0a1.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069125-a9edf115-33a1-48bf-9e4f-7edbc4269a1e.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069129-98553a25-00e2-4f0f-9b44-dfc4e4f6b6d1.png " width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069131-f7649bb2-255c-4725-a635-799b8b4d815a.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069135-acb69b89-55db-41ac-9846-e2536ef3d955.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069137-1f0abfdb-6608-432e-bd40-c8e1ab86ef8b.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069140-18c6a439-f117-498d-9cdb-ade71cc2c248.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069145-80452f86-afcf-42b5-8423-328cca9e4750.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069154-63a25c1c-b448-44c2-8baf-eb31952c5476.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069156-1b881c6b-5680-4f9a-aef1-2491af50675d.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069161-8759f3d4-8456-43ea-bf54-99a646d5a109.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069167-937aa847-c661-431c-b3dc-5a3c890b31cd.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069170-43d0dce4-6c62-485d-adf4-364c8467c251.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069175-70bc9e50-b833-4a2a-8a3f-c0775dac49c2.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069179-d01f8a0f-4383-4b08-b064-4e6bb006e745.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069184-d423a84c-c9dd-4125-9dc7-397cae21efc9.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069188-fc4deb80-38a2-4c50-9a29-30cee4c8e374.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069193-77a19ee8-b1e2-4c27-9016-3440a1547470.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069196-5f050524-ac08-4831-89f5-9e9e3ce085c1.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069200-4f963171-c790-4f43-8ca3-2e701ad3731c.jpeg" width = "400" /> </div>
|
@ -0,0 +1,11 @@
|
||||
more_demo
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
product.md
|
||||
logo.md
|
||||
cartoon.md
|
||||
more_demo.md
|
||||
vehicle.md
|
@ -0,0 +1,65 @@
|
||||
# Logo Demo Images
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096687-5b562e2d-0653-4be6-861d-1936a4440df2.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096698-4b95eb4b-6638-47dc-ae48-7b40744a31ba.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096701-4a4b2bd9-85f2-4d55-be4b-be6ab5e0fb81.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096706-ef4ad024-7284-4cb3-975a-779fd06b96f5.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096710-620b0495-cc83-4501-a104-dfe20afb53d2.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096713-48e629aa-c637-4603-b005-18570fa94d6d.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096715-709957f2-50bb-4edb-a6e4-e7d5601872c7.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096717-a74744cc-4fb8-4e78-b1cb-20409582ca52.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096721-d4af003c-7945-4591-9e47-4e428dc2628c.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096726-460af6ab-8595-4fb4-9960-4c66b18bee1e.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096728-81494000-92b5-40ad-a6a7-606dae3548a3.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096731-2e980977-9ee6-4e29-bdf7-8397820f70e8.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096732-7d425b45-6b04-4984-948d-278da13dd802.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096735-a9c85c14-5965-4529-a235-ce00035bd7ab.jpg " width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096736-3182efc6-ba43-4cde-9397-88a131f4fed8.jpg " width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096737-91e6fa24-1eb5-4aba-9271-5a3722cbe35b.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096740-f440f89b-5f95-493a-b087-00c7cd3481ef.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096747-31b924e3-ffb2-45ab-872e-4ff923ed04f1.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096752-1f98c937-5d83-4c29-b495-01971b5fb258.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096756-a994c7e2-b9e7-40ba-9934-78c10666217b.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096757-879749e0-9e04-4d1e-a07b-6a4322975a84.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096761-5b682ce8-4f83-4fbb-bfb7-df749912aa8b.png " width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096767-e8f701eb-d0e8-4304-b031-e2bff8c199f3.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096778-ec2ad374-b9fc-427e-9e8b-8e5d2afc6394.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096783-9ec5e04d-19e3-463d-ad9d-7a26202bbb9c.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096788-44f04979-18ca-4ba6-b833-7489b344ffff.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096791-6989451e-157c-4101-8b54-7578b05eb7c9.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096796-cc4477cf-016c-4b19-86c3-61824704ecf5.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096798-ba33ee0d-45b8-48ad-a8fa-14cd643a6976.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096805-e29a2ba8-4785-4ca6-9e0d-596fad6ce8dc.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096812-7d8c57a5-fbae-4496-8144-3b40ac74fef0.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096816-50f0ac3d-f2eb-4011-a34e-58e2e215b7b0.jpg " width = "400" /> </div>
|
@ -0,0 +1,34 @@
|
||||
## Demo images
|
||||
- Product recognition
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277277-7b29f596-35f6-4f00-8d2b-0ef0be57a090.jpg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277291-f7d2b2a1-5790-4f5b-a0e6-f5c52d04a69a.jpg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277300-8ce0d5ce-e0ca-46ea-bb9a-74df0df66ae3.jpg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277308-14a097bd-2bcd-41ce-a9e6-5e9cd0bd8b08.jpg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277311-208ae574-a708-46e2-a41e-c639322913b1.jpg" width = "400" /> </div>
|
||||
|
||||
[More demo images](product.md)
|
||||
|
||||
- Cartoon character recognition
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069108-ad54ae1d-610d-4cfa-9cd6-8ee8d280d61d.jpeg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069100-7539d292-1bd8-4655-8a6d-d1f2238bd618.jpeg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140069080-a821e0b7-8a10-4946-bf05-ff093cc16064.jpeg" width = "400" /> </div>
|
||||
|
||||
[More demo images](cartoon.md)
|
||||
|
||||
- Logo recognition
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096687-5b562e2d-0653-4be6-861d-1936a4440df2.jpeg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096701-4a4b2bd9-85f2-4d55-be4b-be6ab5e0fb81.jpeg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096706-ef4ad024-7284-4cb3-975a-779fd06b96f5.jpeg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096713-48e629aa-c637-4603-b005-18570fa94d6d.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096752-1f98c937-5d83-4c29-b495-01971b5fb258.jpeg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140096767-e8f701eb-d0e8-4304-b031-e2bff8c199f3.jpeg" width = "400" /> </div>
|
||||
|
||||
[More demo images](logo.md)
|
||||
|
||||
- Car recognition
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243899-c60f0a51-db9b-438a-9f2d-0d2893c200bb.jpeg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243905-7eeb938d-d88f-4540-a667-06e08dcf1f55.jpeg" width = "400" /> </div>
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243911-735a6ec0-a624-4965-b3cd-2b9f52fa8d65.jpeg" width = "400" /> </div>
|
||||
|
||||
[More demo images](vehicle.md)
|
@ -0,0 +1,179 @@
|
||||
# Product Demo Images
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277277-7b29f596-35f6-4f00-8d2b-0ef0be57a090.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277287-7bdad02a-8e3c-4e04-861c-95a5dae1f3c6.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277291-f7d2b2a1-5790-4f5b-a0e6-f5c52d04a69a.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277294-80aaab94-5109-41be-97f8-3ada73118963.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277296-2a8d7846-cd2e-454e-8b72-46233da09451.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277300-8ce0d5ce-e0ca-46ea-bb9a-74df0df66ae3.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277302-25c973eb-f9aa-42ce-b9e9-66cee738c241.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277303-3d3460da-c6aa-4994-b585-17bc9f3df504.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277306-20cbef71-cc58-4ae1-965b-4806e82988a9.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277308-14a097bd-2bcd-41ce-a9e6-5e9cd0bd8b08.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277309-be092d1c-6513-472c-8b7f-685f4353ae5b.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277311-208ae574-a708-46e2-a41e-c639322913b1.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277314-72901737-5ef5-4a23-820b-1db58c5e6ca0.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277318-aef4080c-24f2-4d92-be3c-45b500b75584.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277320-8046d0df-1256-41ce-a8d6-6d2c1292462c.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277321-e3864473-6a8e-485f-81f2-562b902d6cff.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277324-0aacc27f-699a-437b-bac0-4a20c90b47b1.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277328-8d28f754-8645-4c05-a9a6-0312bbe2f890.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277333-59da1513-e7e5-455c-ab73-7a3162216923.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277335-454c0423-5398-4348-aaab-e2652fd08999.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277338-a7d09c28-1b86-4cf5-bd79-99d51c5b5311.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277343-9c456d21-8018-4cd5-9c0b-cc7c087fac69.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277345-2ef780f1-d7c9-4cf2-a370-f220a052eb71.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277347-baa4b870-7fca-4d4c-8528-fad720270024.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277351-e0691080-ede4-49ae-9075-d36a41cebf25.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277354-509f8f85-f171-44e9-8ca1-4c3cae77b5fb.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277357-39d572b8-60ee-44db-9e0e-2c0ea2be2ed3.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277359-6caf33f6-2a38-48e5-b349-f4dd1ef2566b.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277362-260daa87-1db7-4f89-ba9c-1b32876fd3b6.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277366-14cfd2f9-d044-4288-843e-463a1816163e.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277368-b0e96341-e030-4e4d-8010-6f7c3bc94d2f.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277370-1f26e4e5-9988-4427-a035-44bfd9d472d6.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277372-27e60b60-cd5c-4b05-ae38-2e9524c627f3.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277374-bd292bb2-e1f9-4d5f-aa49-d67ac571d01b.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277377-b0b8cdb9-8196-4598-ae47-b615914bf6bf.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277382-fc89d18a-a57b-4331-adbb-bda3584fb122.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277386-d676467c-9846-4051-8192-b3e089d01cdc.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277390-83f66d3f-c415-47e6-b651-6b51fbe59bbf.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277394-9895d654-3163-4dd9-882c-ac5a893e2ad3.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277396-9e5e3aa3-6d9e-40ab-a325-2edea452156d.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277399-b92e2092-eabd-45c8-bf36-b2e238167892.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277404-285f943a-de70-48b8-9545-53e229b7350d.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277406-0b7ec434-f064-4985-80f3-c00735b3e32d.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277408-4f9b8b19-42c2-4ba4-bf6d-b95ababe0313.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277409-6df0faf7-71b7-4c9a-a875-36ae7ee7129d.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277411-9c2b364a-749d-465e-a85d-29a69e9ff3ef.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277413-c54a462c-dd3b-4ad0-985d-ef0ec1f216ec.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277414-6d922055-cd59-4f84-b5b6-651209d6336a.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277417-78e1322e-4039-4232-b217-1be4f207f804.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277419-181822a3-bae6-4c4f-9959-59e991c2df6c.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277422-76f09d84-cb47-4332-aa88-a12458cd8993.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277424-a72203b5-1a99-4464-a39c-245f7a891f25.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277429-521ac9a6-e4c3-4c74-9c5b-8e8dd6cddf34.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277433-4f9fb9c8-7930-4303-b54e-a6eace347923.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277434-f3aa3565-a2c5-4c1c-ab44-930a8b073b5f.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277437-90cf1cd7-6a62-4ac4-ac85-3aa534e50cee.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277439-54e168bc-9518-429e-9e97-cb9ca5e811c9.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277441-a3c277d7-c889-4556-b74a-400cadf8b771.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277442-22a0cd38-acd8-4b5a-8e59-c4bea852fb79.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277444-ea662034-c17f-47ba-9ea3-694d3cb0c880.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277448-a71f4a0a-c3cc-4432-a803-843b7c65307f.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277449-0b3a2e98-3e09-4bd6-be32-c35f44154e8a.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277452-e36ccc63-8e39-4973-a336-4ace855d25e6.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277454-bddd9527-b189-4771-ab9e-52085db5a44d.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277455-7ea277ba-bc75-48db-9567-40e1acb56f02.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277460-0f5ee4dc-5ece-45d5-8ef9-666f1be41b76.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277461-37cab773-6341-4c91-b1f4-780d530eab3b.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277465-8f53ef9d-0465-4a90-afac-b1dd3c970b72.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277467-655ddabe-cbe0-4d1f-a30e-c2965428e8d7.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277470-4587e905-3fc8-4dad-84ee-0844ba4d2474.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277473-a155944f-efe3-492a-babc-2f3fe700a99b.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277475-c95ab821-f5ae-427a-8721-8991f9c7f29f.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277479-55b59855-2ed6-4526-9481-6b92b25fef97.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277484-556f0e4c-007b-4f6a-b21f-c485f630cbcb.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277486-a39eb069-bc13-415e-b936-ba294216dfac.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277487-80952841-6a76-4fb3-8049-fe15ce8f7cfb.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277491-e892a6a8-6f9a-46c7-83e0-261cfb92d276.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277494-520f483e-654d-4399-9684-1fcd9778b76e.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277496-54b1ada5-e6a6-4654-a8a6-739511cec750.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277500-ff7e2afd-9cd7-484a-bd1e-362226f5197f.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277501-94489261-bea5-4492-bf3e-98cc8aaa7a7f.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277504-567a32bc-a573-4154-a9cd-6acbec923768.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277506-e893d4d5-43ce-4df1-9f08-3cdf6a8c7e2c.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277509-5766629f-bb92-4552-b34a-647e29b9a89b.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277511-8821708b-09f0-4aab-86dd-40ae3794697a.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277515-ed6a0dff-bd91-4233-a9af-e2744df7c7e0.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277519-1883d6a1-9348-4514-8924-dde27dd38704.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277524-b9d8515c-4df2-410a-b4a6-da098cb9da61.jpg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140277526-52a9c666-a799-4921-b371-41d97d7d9242.jpg" width = "400" /> </div>
|
@ -0,0 +1,33 @@
|
||||
# Vehicle Demo Images
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243899-c60f0a51-db9b-438a-9f2d-0d2893c200bb.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243904-fdbe2e01-dc7c-449a-8e9e-baea4f85fee4.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243905-7eeb938d-d88f-4540-a667-06e08dcf1f55.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243908-c7f1e3ea-92a7-429b-888c-732b9ec5398f.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243911-735a6ec0-a624-4965-b3cd-2b9f52fa8d65.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243913-baec489a-5463-472b-b5d1-418bcd4eb978.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243916-f50dfcdd-2d5f-48f9-876f-dbc05f4afa30.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243920-7a65ec82-8312-421e-985a-c394f11af28f.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243922-458e6dca-fb80-4baf-951e-9651080dc242.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243926-5df3036b-9ea1-441c-b30a-b4f847df25ab.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243927-7673d94a-fbb0-4a92-a3f3-c879a432a7db.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243928-91082855-c5a7-4a3f-aeea-7a2e51e43183.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243929-88fe7efa-b212-4105-af2f-2248a6cb2877.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243933-49e71d02-8228-40ec-99b2-3ed862bf4ba5.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243935-530fbfa3-0d34-4d9d-bd59-2fde5659f7e5.jpeg" width = "400" /> </div>
|
||||
|
||||
<div align="center"> <img src="https://user-images.githubusercontent.com/12560511/140243940-d289fc7d-d343-4aa5-a807-9ce09a241ccd.jpeg" width = "400" /> </div>
|
@ -0,0 +1,35 @@
|
||||
@ECHO OFF
|
||||
|
||||
pushd %~dp0
|
||||
|
||||
REM Command file for Sphinx documentation
|
||||
|
||||
if "%SPHINXBUILD%" == "" (
|
||||
set SPHINXBUILD=sphinx-build
|
||||
)
|
||||
set SOURCEDIR=.
|
||||
set BUILDDIR=_build
|
||||
|
||||
if "%1" == "" goto help
|
||||
|
||||
%SPHINXBUILD% >NUL 2>NUL
|
||||
if errorlevel 9009 (
|
||||
echo.
|
||||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
|
||||
echo.installed, then set the SPHINXBUILD environment variable to point
|
||||
echo.to the full path of the 'sphinx-build' executable. Alternatively you
|
||||
echo.may add the Sphinx directory to PATH.
|
||||
echo.
|
||||
echo.If you don't have Sphinx installed, grab it from
|
||||
echo.http://sphinx-doc.org/
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
|
||||
goto end
|
||||
|
||||
:help
|
||||
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
|
||||
|
||||
:end
|
||||
popd
|
@ -0,0 +1,27 @@
|
||||
# DLA series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## Overview
|
||||
|
||||
DLA (Deep Layer Aggregation). Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Although skip connections have been incorporated to combine layers, these connections have been "shallow" themselves, and only fuse by simple, one-step operations. The authors augment standard architectures with deeper aggregation to better fuse information across layers. Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. [paper](https://arxiv.org/abs/1707.06484)
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|
||||
|:-----------------:|:----------:|:---------:|:---------:|:---------:|
|
||||
| DLA34 | 15.8 | 3.1 | 76.03 | 92.98 |
|
||||
| DLA46_c | 1.3 | 0.5 | 63.21 | 85.30 |
|
||||
| DLA46x_c | 1.1 | 0.5 | 64.36 | 86.01 |
|
||||
| DLA60 | 22.0 | 4.2 | 76.10 | 92.92 |
|
||||
| DLA60x | 17.4 | 3.5 | 77.53 | 93.78 |
|
||||
| DLA60x_c | 1.3 | 0.6 | 66.45 | 87.54 |
|
||||
| DLA102 | 33.3 | 7.2 | 78.93 | 94.52 |
|
||||
| DLA102x | 26.4 | 5.9 | 78.10 | 94.00 |
|
||||
| DLA102x2 | 41.4 | 9.3 | 78.85 | 94.45 |
|
||||
| DLA169 | 53.5 | 11.6 | 78.09 | 94.09 |
|
@ -0,0 +1,78 @@
|
||||
# DPN and DenseNet series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
* [3. Inference speed based on V100 GPU](#3)
|
||||
* [4. Inference speed based on T4 GPU](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge. The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPs and parameters.
|
||||
|
||||
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
The pretrained models of these two types of models (a total of 10) are open sourced in PaddleClas at present. The indicators are shown in the figure above. It is easy to observe that under the same FLOPs and parameters, DPN has higher accuracy than DenseNet. However,because DPN has more branches, its inference speed is slower than DenseNet. Since DenseNet264 has the deepest layers in all DenseNet networks, it has the largest parameters,DenseNet161 has the largest width, resulting the largest FLOPs and the highest accuracy in this series. From the perspective of inference speed, DenseNet161, which has a large FLOPs and high accuracy, has a faster speed than DenseNet264, so it has a greater advantage than DenseNet264.
|
||||
|
||||
For DPN series networks, the larger the model's FLOPs and parameters, the higher the model's accuracy. Among them, since the width of DPN107 is the largest, it has the largest number of parameters and FLOPs in this series of networks.
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| DenseNet121 | 0.757 | 0.926 | 0.750 | | 5.690 | 7.980 |
|
||||
| DenseNet161 | 0.786 | 0.941 | 0.778 | | 15.490 | 28.680 |
|
||||
| DenseNet169 | 0.768 | 0.933 | 0.764 | | 6.740 | 14.150 |
|
||||
| DenseNet201 | 0.776 | 0.937 | 0.775 | | 8.610 | 20.010 |
|
||||
| DenseNet264 | 0.780 | 0.939 | 0.779 | | 11.540 | 33.370 |
|
||||
| DPN68 | 0.768 | 0.934 | 0.764 | 0.931 | 4.030 | 10.780 |
|
||||
| DPN92 | 0.799 | 0.948 | 0.793 | 0.946 | 12.540 | 36.290 |
|
||||
| DPN98 | 0.806 | 0.951 | 0.799 | 0.949 | 22.220 | 58.460 |
|
||||
| DPN107 | 0.809 | 0.953 | 0.802 | 0.951 | 35.060 | 82.970 |
|
||||
| DPN131 | 0.807 | 0.951 | 0.801 | 0.949 | 30.510 | 75.360 |
|
||||
|
||||
|
||||
|
||||
<a name='3'></a>
|
||||
## 3. Inference speed based on V100 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|
||||
|-------------|-----------|-------------------|--------------------------|
|
||||
| DenseNet121 | 224 | 256 | 4.371 |
|
||||
| DenseNet161 | 224 | 256 | 8.863 |
|
||||
| DenseNet169 | 224 | 256 | 6.391 |
|
||||
| DenseNet201 | 224 | 256 | 8.173 |
|
||||
| DenseNet264 | 224 | 256 | 11.942 |
|
||||
| DPN68 | 224 | 256 | 11.805 |
|
||||
| DPN92 | 224 | 256 | 17.840 |
|
||||
| DPN98 | 224 | 256 | 21.057 |
|
||||
| DPN107 | 224 | 256 | 28.685 |
|
||||
| DPN131 | 224 | 256 | 28.083 |
|
||||
|
||||
|
||||
<a name='4'></a>
|
||||
## 4. Inference speed based on T4 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|
||||
|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
|
||||
| DenseNet121 | 224 | 256 | 4.16436 | 7.2126 | 10.50221 | 4.40447 | 9.32623 | 15.25175 |
|
||||
| DenseNet161 | 224 | 256 | 9.27249 | 14.25326 | 20.19849 | 10.39152 | 22.15555 | 35.78443 |
|
||||
| DenseNet169 | 224 | 256 | 6.11395 | 10.28747 | 13.68717 | 6.43598 | 12.98832 | 20.41964 |
|
||||
| DenseNet201 | 224 | 256 | 7.9617 | 13.4171 | 17.41949 | 8.20652 | 17.45838 | 27.06309 |
|
||||
| DenseNet264 | 224 | 256 | 11.70074 | 19.69375 | 24.79545 | 12.14722 | 26.27707 | 40.01905 |
|
||||
| DPN68 | 224 | 256 | 11.7827 | 13.12652 | 16.19213 | 11.64915 | 12.82807 | 18.57113 |
|
||||
| DPN92 | 224 | 256 | 18.56026 | 20.35983 | 29.89544 | 18.15746 | 23.87545 | 38.68821 |
|
||||
| DPN98 | 224 | 256 | 21.70508 | 24.7755 | 40.93595 | 21.18196 | 33.23925 | 62.77751 |
|
||||
| DPN107 | 224 | 256 | 27.84462 | 34.83217 | 60.67903 | 27.62046 | 52.65353 | 100.11721 |
|
||||
| DPN131 | 224 | 256 | 28.58941 | 33.01078 | 55.65146 | 28.33119 | 46.19439 | 89.24904 |
|
@ -0,0 +1,23 @@
|
||||
# ESNet Series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
ESNet (Enhanced ShuffleNet) is a lightweight network developed by Baidu. This network combines the advantages of MobileNetV3, GhostNet, and PPLCNet on the basis of ShuffleNetV2 to form a faster and more accurate network on ARM devices, Because of its excellent performance, [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) launched in PaddleDetection uses this model as a backbone, with stronger object detection algorithm, the final mAP index refreshed the SOTA index of the object detection model on the ARM device in one fell swoop.
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | FLOPs<br>(M) | Params<br/>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|
|
||||
| ESNet_x0_25 | 62.48 | 83.46 | 30.9 | 2.83 |
|
||||
| ESNet_x0_5 | 68.82 | 88.04 | 67.3 | 3.25 |
|
||||
| ESNet_x0_75 | 72.24 | 90.45 | 123.7 | 3.87 |
|
||||
| ESNet_x1_0 | 73.92 | 91.40 | 197.3 | 4.64 |
|
||||
|
||||
Please stay tuned for information such as Inference speed.
|
@ -0,0 +1,91 @@
|
||||
# EfficientNet and ResNeXt101_wsl series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
* [3. Inference speed based on V100 GPU](#3)
|
||||
* [4. Inference speed based on T4 GPU](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
EfficientNet is a lightweight NAS-based network released by Google in 2019. EfficientNetB7 refreshed the classification accuracy of ImageNet-1k at that time. In this paper, the author points out that the traditional methods to improve the performance of neural networks mainly start with the width of the network, the depth of the network, and the resolution of the input picture.
|
||||
However, the author found that balancing these three dimensions is essential for improving accuracy and efficiency through experiments.
|
||||
Therefore, the author summarized how to balance the three dimensions at the same time through a series of experiments.
|
||||
At the same time, based on this scaling method, the author built a total of 7 networks B1-B7 in the EfficientNet series on the basis of EfficientNetB0, and with the same FLOPs and parameters, the accuracy reached state-of-the-art effect.
|
||||
|
||||
ResNeXt is an improved version of ResNet that proposed by Facebook in 2016. In 2019, Facebook researchers studied the accuracy limit of the series network on ImageNet through weakly-supervised-learning. In order to distinguish the previous ResNeXt network, the suffix of this series network is WSL, where WSL is the abbreviation of weakly-supervised-learning. In order to have stronger feature extraction capability, the researchers further enlarged the network width, among which the largest ResNeXt101_32x48d_wsl has 800 million parameters. It was trained under 940 million weak-labeled images, and the results were finetune trained on imagenet-1k. Finally, the acc-1 of imagenet-1k reaches 85.4%, which is also the network with the highest precision under the resolution of 224x224 on imagenet-1k so far. In Fix-ResNeXt, the author used a larger image resolution, made a special Fix strategy for the inconsistency of image data preprocessing in training and testing, and made ResNeXt101_32x48d_wsl have a higher accuracy. Since it used the Fix strategy, it was named Fix-ResNeXt101_32x48d_wsl.
|
||||
|
||||
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
At present, there are a total of 14 pretrained models of the two types of models that PaddleClas open source. It can be seen from the above figure that the advantages of the EfficientNet series network are very obvious. The ResNeXt101_wsl series model uses more data, and the final accuracy is also higher. EfficientNet_B0_small removes SE_block based on EfficientNet_B0, which has faster inference speed.
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| ResNeXt101_<br>32x8d_wsl | 0.826 | 0.967 | 0.822 | 0.964 | 29.140 | 78.440 |
|
||||
| ResNeXt101_<br>32x16d_wsl | 0.842 | 0.973 | 0.842 | 0.972 | 57.550 | 152.660 |
|
||||
| ResNeXt101_<br>32x32d_wsl | 0.850 | 0.976 | 0.851 | 0.975 | 115.170 | 303.110 |
|
||||
| ResNeXt101_<br>32x48d_wsl | 0.854 | 0.977 | 0.854 | 0.976 | 173.580 | 456.200 |
|
||||
| Fix_ResNeXt101_<br>32x48d_wsl | 0.863 | 0.980 | 0.864 | 0.980 | 354.230 | 456.200 |
|
||||
| EfficientNetB0 | 0.774 | 0.933 | 0.773 | 0.935 | 0.720 | 5.100 |
|
||||
| EfficientNetB1 | 0.792 | 0.944 | 0.792 | 0.945 | 1.270 | 7.520 |
|
||||
| EfficientNetB2 | 0.799 | 0.947 | 0.803 | 0.950 | 1.850 | 8.810 |
|
||||
| EfficientNetB3 | 0.812 | 0.954 | 0.817 | 0.956 | 3.430 | 11.840 |
|
||||
| EfficientNetB4 | 0.829 | 0.962 | 0.830 | 0.963 | 8.290 | 18.760 |
|
||||
| EfficientNetB5 | 0.836 | 0.967 | 0.837 | 0.967 | 19.510 | 29.610 |
|
||||
| EfficientNetB6 | 0.840 | 0.969 | 0.842 | 0.968 | 36.270 | 42.000 |
|
||||
| EfficientNetB7 | 0.843 | 0.969 | 0.844 | 0.971 | 72.350 | 64.920 |
|
||||
| EfficientNetB0_<br>small | 0.758 | 0.926 | | | 0.720 | 4.650 |
|
||||
|
||||
<a name='3'></a>
|
||||
## 3. Inference speed based on V100 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|
||||
|-------------------------------|-----------|-------------------|--------------------------|
|
||||
| ResNeXt101_<br>32x8d_wsl | 224 | 256 | 19.127 |
|
||||
| ResNeXt101_<br>32x16d_wsl | 224 | 256 | 23.629 |
|
||||
| ResNeXt101_<br>32x32d_wsl | 224 | 256 | 40.214 |
|
||||
| ResNeXt101_<br>32x48d_wsl | 224 | 256 | 59.714 |
|
||||
| Fix_ResNeXt101_<br>32x48d_wsl | 320 | 320 | 82.431 |
|
||||
| EfficientNetB0 | 224 | 256 | 2.449 |
|
||||
| EfficientNetB1 | 240 | 272 | 3.547 |
|
||||
| EfficientNetB2 | 260 | 292 | 3.908 |
|
||||
| EfficientNetB3 | 300 | 332 | 5.145 |
|
||||
| EfficientNetB4 | 380 | 412 | 7.609 |
|
||||
| EfficientNetB5 | 456 | 488 | 12.078 |
|
||||
| EfficientNetB6 | 528 | 560 | 18.381 |
|
||||
| EfficientNetB7 | 600 | 632 | 27.817 |
|
||||
| EfficientNetB0_<br>small | 224 | 256 | 1.692 |
|
||||
|
||||
|
||||
<a name='4'></a>
|
||||
## 4. Inference speed based on T4 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|
||||
|---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
|
||||
| ResNeXt101_<br>32x8d_wsl | 224 | 256 | 18.19374 | 21.93529 | 34.67802 | 18.52528 | 34.25319 | 67.2283 |
|
||||
| ResNeXt101_<br>32x16d_wsl | 224 | 256 | 18.52609 | 36.8288 | 62.79947 | 25.60395 | 71.88384 | 137.62327 |
|
||||
| ResNeXt101_<br>32x32d_wsl | 224 | 256 | 33.51391 | 70.09682 | 125.81884 | 54.87396 | 160.04337 | 316.17718 |
|
||||
| ResNeXt101_<br>32x48d_wsl | 224 | 256 | 50.97681 | 137.60926 | 190.82628 | 99.01698256 | 315.91261 | 551.83695 |
|
||||
| Fix_ResNeXt101_<br>32x48d_wsl | 320 | 320 | 78.62869 | 191.76039 | 317.15436 | 160.0838242 | 595.99296 | 1151.47384 |
|
||||
| EfficientNetB0 | 224 | 256 | 3.40122 | 5.95851 | 9.10801 | 3.442 | 6.11476 | 9.3304 |
|
||||
| EfficientNetB1 | 240 | 272 | 5.25172 | 9.10233 | 14.11319 | 5.3322 | 9.41795 | 14.60388 |
|
||||
| EfficientNetB2 | 260 | 292 | 5.91052 | 10.5898 | 17.38106 | 6.29351 | 10.95702 | 17.75308 |
|
||||
| EfficientNetB3 | 300 | 332 | 7.69582 | 16.02548 | 27.4447 | 7.67749 | 16.53288 | 28.5939 |
|
||||
| EfficientNetB4 | 380 | 412 | 11.55585 | 29.44261 | 53.97363 | 12.15894 | 30.94567 | 57.38511 |
|
||||
| EfficientNetB5 | 456 | 488 | 19.63083 | 56.52299 | - | 20.48571 | 61.60252 | - |
|
||||
| EfficientNetB6 | 528 | 560 | 30.05911 | - | - | 32.62402 | - | - |
|
||||
| EfficientNetB7 | 600 | 632 | 47.86087 | - | - | 53.93823 | - | - |
|
||||
| EfficientNetB0_small | 224 | 256 | 2.39166 | 4.36748 | 6.96002 | 2.3076 | 4.71886 | 7.21888 |
|
@ -0,0 +1,75 @@
|
||||
# HRNet series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
* [3. Inference speed based on V100 GPU](#3)
|
||||
* [4. Inference speed based on T4 GPU](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
HRNet is a brand new neural network proposed by Microsoft research Asia in 2019. Different from the previous convolutional neural network, this network can still maintain high resolution in the deep layer of the network, so the heat map of the key points predicted is more accurate, and it is also more accurate in space. In addition, the network performs particularly well in other visual tasks sensitive to resolution, such as detection and segmentation.
|
||||
|
||||
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
At present, there are 7 pretrained models of such models open-sourced by PaddleClas, and their indicators are shown in the figure. Among them, the reason why the accuracy of the HRNet_W48_C indicator is abnormal may be due to fluctuations in training.
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| HRNet_W18_C | 0.769 | 0.934 | 0.768 | 0.934 | 4.140 | 21.290 |
|
||||
| HRNet_W18_C_ssld | 0.816 | 0.958 | 0.768 | 0.934 | 4.140 | 21.290 |
|
||||
| HRNet_W30_C | 0.780 | 0.940 | 0.782 | 0.942 | 16.230 | 37.710 |
|
||||
| HRNet_W32_C | 0.783 | 0.942 | 0.785 | 0.942 | 17.860 | 41.230 |
|
||||
| HRNet_W40_C | 0.788 | 0.945 | 0.789 | 0.945 | 25.410 | 57.550 |
|
||||
| HRNet_W44_C | 0.790 | 0.945 | 0.789 | 0.944 | 29.790 | 67.060 |
|
||||
| HRNet_W48_C | 0.790 | 0.944 | 0.793 | 0.945 | 34.580 | 77.470 |
|
||||
| HRNet_W48_C_ssld | 0.836 | 0.968 | 0.793 | 0.945 | 34.580 | 77.470 |
|
||||
| HRNet_W64_C | 0.793 | 0.946 | 0.795 | 0.946 | 57.830 | 128.060 |
|
||||
| SE_HRNet_W64_C_ssld | 0.847 | 0.973 | | | 57.830 | 128.970 |
|
||||
|
||||
<a name='3'></a>
|
||||
## 3. Inference speed based on V100 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|
||||
|-------------|-----------|-------------------|--------------------------|
|
||||
| HRNet_W18_C | 224 | 256 | 7.368 |
|
||||
| HRNet_W18_C_ssld | 224 | 256 | 7.368 |
|
||||
| HRNet_W30_C | 224 | 256 | 9.402 |
|
||||
| HRNet_W32_C | 224 | 256 | 9.467 |
|
||||
| HRNet_W40_C | 224 | 256 | 10.739 |
|
||||
| HRNet_W44_C | 224 | 256 | 11.497 |
|
||||
| HRNet_W48_C | 224 | 256 | 12.165 |
|
||||
| HRNet_W48_C_ssld | 224 | 256 | 12.165 |
|
||||
| HRNet_W64_C | 224 | 256 | 15.003 |
|
||||
|
||||
|
||||
|
||||
<a name='4'></a>
|
||||
## 4. Inference speed based on T4 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|
||||
|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
|
||||
| HRNet_W18_C | 224 | 256 | 6.79093 | 11.50986 | 17.67244 | 7.40636 | 13.29752 | 23.33445 |
|
||||
| HRNet_W18_C_ssld | 224 | 256 | 6.79093 | 11.50986 | 17.67244 | 7.40636 | 13.29752 | 23.33445 |
|
||||
| HRNet_W30_C | 224 | 256 | 8.98077 | 14.08082 | 21.23527 | 9.57594 | 17.35485 | 32.6933 |
|
||||
| HRNet_W32_C | 224 | 256 | 8.82415 | 14.21462 | 21.19804 | 9.49807 | 17.72921 | 32.96305 |
|
||||
| HRNet_W40_C | 224 | 256 | 11.4229 | 19.1595 | 30.47984 | 12.12202 | 25.68184 | 48.90623 |
|
||||
| HRNet_W44_C | 224 | 256 | 12.25778 | 22.75456 | 32.61275 | 13.19858 | 32.25202 | 59.09871 |
|
||||
| HRNet_W48_C | 224 | 256 | 12.65015 | 23.12886 | 33.37859 | 13.70761 | 34.43572 | 63.01219 |
|
||||
| HRNet_W48_C_ssld | 224 | 256 | 12.65015 | 23.12886 | 33.37859 | 13.70761 | 34.43572 | 63.01219 |
|
||||
| HRNet_W64_C | 224 | 256 | 15.10428 | 27.68901 | 40.4198 | 17.57527 | 47.9533 | 97.11228 |
|
||||
| SE_HRNet_W64_C_ssld | 224 | 256 | 32.33651 | 69.31189 | 116.07245 | 31.69770 | 94.99546 | 174.45766 |
|
||||
|
@ -0,0 +1,74 @@
|
||||
# Inception series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
* [3. Inference speed based on V100 GPU](#3)
|
||||
* [4. Inference speed based on T4 GPU](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
GoogLeNet is a new neural network structure designed by Google in 2014, which, together with VGG network, became the twin champions of the ImageNet challenge that year. GoogLeNet introduces the Inception structure for the first time, and stacks the Inception structure in the network so that the number of network layers reaches 22, which is also the mark of the convolutional network exceeding 20 layers for the first time. Since 1x1 convolution is used in the Inception structure to reduce the dimension of channel number, and Global pooling is used to replace the traditional method of processing features in multiple fc layers, the final GoogLeNet network has much less FLOPs and parameters than VGG network, which has become a beautiful scenery of neural network design at that time.
|
||||
|
||||
InceptionV3 is an improvement of InceptionV2 by Google. First of all, the author optimized the Inception module in InceptionV3. At the same time, more types of Inception modules were designed and used. Further, the larger square two-dimensional convolution kernel in some Inception modules in InceptionV3 was disassembled into two smaller asymmetric convolution kernels, which can greatly save the amount of parameters.
|
||||
|
||||
Xception is another improvement to InceptionV3 that Google proposed after Inception. In Xception, the author used the depthwise separable convolution to replace the traditional convolution operation, which greatly saved the network FLOPs and the number of parameters, but improved the accuracy. In DeeplabV3+, the author further improved the Xception and increased the number of Xception layers, and designed the network of Xception65 and Xception71.
|
||||
|
||||
InceptionV4 is a new neural network designed by Google in 2016, when residual structure were all the rage, but the authors believe that high performance can be achieved using only Inception structure. InceptionV4 uses more Inception structure to achieve even greater precision on Imagenet-1k.
|
||||
|
||||
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
The figure above reflects the relationship between the accuracy of Xception series and InceptionV4 and other indicators. Among them, Xception_deeplab is consistent with the structure of the paper, and Xception is an improved model developed by PaddleClas, which improves the accuracy by about 0.6% when the inference speed is basically unchanged. Details of the improved model are being updated, so stay tuned.
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| GoogLeNet | 0.707 | 0.897 | 0.698 | | 2.880 | 8.460 |
|
||||
| Xception41 | 0.793 | 0.945 | 0.790 | 0.945 | 16.740 | 22.690 |
|
||||
| Xception41<br>_deeplab | 0.796 | 0.944 | | | 18.160 | 26.730 |
|
||||
| Xception65 | 0.810 | 0.955 | | | 25.950 | 35.480 |
|
||||
| Xception65<br>_deeplab | 0.803 | 0.945 | | | 27.370 | 39.520 |
|
||||
| Xception71 | 0.811 | 0.955 | | | 31.770 | 37.280 |
|
||||
| InceptionV3 | 0.791 | 0.946 | 0.788 | 0.944 | 11.460 | 23.830 |
|
||||
| InceptionV4 | 0.808 | 0.953 | 0.800 | 0.950 | 24.570 | 42.680 |
|
||||
|
||||
|
||||
<a name='3'></a>
|
||||
## 3. Inference speed based on V100 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|
||||
|------------------------|-----------|-------------------|--------------------------|
|
||||
| GoogLeNet | 224 | 256 | 1.807 |
|
||||
| Xception41 | 299 | 320 | 3.972 |
|
||||
| Xception41_<br>deeplab | 299 | 320 | 4.408 |
|
||||
| Xception65 | 299 | 320 | 6.174 |
|
||||
| Xception65_<br>deeplab | 299 | 320 | 6.464 |
|
||||
| Xception71 | 299 | 320 | 6.782 |
|
||||
| InceptionV4 | 299 | 320 | 11.141 |
|
||||
|
||||
|
||||
<a name='4'></a>
|
||||
## 4. Inference speed based on T4 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|
||||
|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
|
||||
| GoogLeNet | 299 | 320 | 1.75451 | 3.39931 | 4.71909 | 1.88038 | 4.48882 | 6.94035 |
|
||||
| Xception41 | 299 | 320 | 2.91192 | 7.86878 | 15.53685 | 4.96939 | 17.01361 | 32.67831 |
|
||||
| Xception41_<br>deeplab | 299 | 320 | 2.85934 | 7.2075 | 14.01406 | 5.33541 | 17.55938 | 33.76232 |
|
||||
| Xception65 | 299 | 320 | 4.30126 | 11.58371 | 23.22213 | 7.26158 | 25.88778 | 53.45426 |
|
||||
| Xception65_<br>deeplab | 299 | 320 | 4.06803 | 9.72694 | 19.477 | 7.60208 | 26.03699 | 54.74724 |
|
||||
| Xception71 | 299 | 320 | 4.80889 | 13.5624 | 27.18822 | 8.72457 | 31.55549 | 69.31018 |
|
||||
| InceptionV3 | 299 | 320 | 3.67502 | 6.36071 | 9.82645 | 6.64054 | 13.53630 | 22.17355 |
|
||||
| InceptionV4 | 299 | 320 | 9.50821 | 13.72104 | 20.27447 | 12.99342 | 25.23416 | 43.56121 |
|
@ -0,0 +1,27 @@
|
||||
# MixNet series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
MixNet is a lightweight network proposed by Google. The main idea of MixNet is to explore the combination of different size of kernels. The author found that the current network has the following two problems:
|
||||
|
||||
- Small convolution kernel has small receptive field and few parameters, but the accuracy is not high.
|
||||
- The larger convolution kernel has larger receptive field and higher accuracy, but the parameters also increase a lot .
|
||||
|
||||
In order to solve the above two problems, MDConv(mixed depthwise convolution) is proposed. In this method, different size of kernels are mixed in a convolution operation block. And based on AutoML, a series of networks called MixNets are proposed, which have achieved good results on Imagenet. [paper](https://arxiv.org/pdf/1907.09595.pdf)
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | FLOPs<br>(M) | Params<br/>(G |
|
||||
| :------: | :---: | :---: | :---------------: | :----------: | ------------- |
|
||||
| MixNet_S | 76.28 | 92.99 | 75.8 | 252.977 | 4.167 |
|
||||
| MixNet_M | 77.67 | 93.64 | 77.0 | 357.119 | 5.065 |
|
||||
| MixNet_L | 78.60 | 94.37 | 78.9 | 579.017 | 7.384 |
|
||||
|
||||
Inference speed and other information are coming soon.
|
@ -0,0 +1,158 @@
|
||||
# Mobile and Embedded Vision Applications Network series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
* [3. Inference speed and storage size based on SD855](#3)
|
||||
* [4. Inference speed based on T4 GPU](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
MobileNetV1 is a network launched by Google in 2017 for use on mobile devices or embedded devices. The network replaces the depthwise separable convolution with the traditional convolution operation, that is, the combination of depthwise convolution and pointwise convolution. Compared with the traditional convolution operation, this combination can greatly save the number of parameters and computation. At the same time, MobileNetV1 can also be used for object detection, image segmentation and other visual tasks.
|
||||
|
||||
MobileNetV2 is a lightweight network proposed by Google following MobileNetV1. Compared with MobileNetV1, MobileNetV2 proposed Linear bottlenecks and Inverted residual block as a basic network structures, to constitute MobileNetV2 network architecture through stacking these basic module a lot. In the end, higher classification accuracy was achieved when FLOPs was only half of MobileNetV1.
|
||||
|
||||
The ShuffleNet series network is the lightweight network structure proposed by MEGVII. So far, there are two typical structures in this series network, namely, ShuffleNetV1 and ShuffleNetV2. A Channel Shuffle operation in ShuffleNet can exchange information between groups and perform end-to-end training. In the paper of ShuffleNetV2, the author proposes four criteria for designing lightweight networks, and designs the ShuffleNetV2 network according to the four criteria and the shortcomings of ShuffleNetV1.
|
||||
|
||||
MobileNetV3 is a new and lightweight network based on NAS proposed by Google in 2019. In order to further improve the effect, the activation functions of relu and sigmoid were replaced with hard_swish and hard_sigmoid activation functions, and some improved strategies were introduced to reduce the amount of network computing.
|
||||
|
||||
GhosttNet is a brand-new lightweight network structure proposed by Huawei in 2020. By introducing the ghost module, the problem of redundant calculation of features in traditional deep networks is greatly alleviated, which greatly reduces the amount of network parameters and calculations.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
Currently there are 32 pretrained models of the mobile series open source by PaddleClas, and their indicators are shown in the figure below. As you can see from the picture, newer lightweight models tend to perform better, and MobileNetV3 represents the latest lightweight neural network architecture. In MobileNetV3, the author used 1x1 convolution after global-avg-pooling in order to obtain higher accuracy,this operation significantly increases the number of parameters but has little impact on the amount of computation, so if the model is evaluated from a storage perspective of excellence, MobileNetV3 does not have much advantage, but because of its smaller computation, it has a faster inference speed. In addition, the SSLD distillation model in our model library performs excellently, refreshing the accuracy of the current lightweight model from various perspectives. Due to the complex structure and many branches of the MobileNetV3 model, which is not GPU friendly, the GPU inference speed is not as good as that of MobileNetV1.
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| MobileNetV1_x0_25 | 0.514 | 0.755 | 0.506 | | 0.070 | 0.460 |
|
||||
| MobileNetV1_x0_5 | 0.635 | 0.847 | 0.637 | | 0.280 | 1.310 |
|
||||
| MobileNetV1_x0_75 | 0.688 | 0.882 | 0.684 | | 0.630 | 2.550 |
|
||||
| MobileNetV1 | 0.710 | 0.897 | 0.706 | | 1.110 | 4.190 |
|
||||
| MobileNetV1_ssld | 0.779 | 0.939 | | | 1.110 | 4.190 |
|
||||
| MobileNetV2_x0_25 | 0.532 | 0.765 | | | 0.050 | 1.500 |
|
||||
| MobileNetV2_x0_5 | 0.650 | 0.857 | 0.654 | 0.864 | 0.170 | 1.930 |
|
||||
| MobileNetV2_x0_75 | 0.698 | 0.890 | 0.698 | 0.896 | 0.350 | 2.580 |
|
||||
| MobileNetV2 | 0.722 | 0.907 | 0.718 | 0.910 | 0.600 | 3.440 |
|
||||
| MobileNetV2_x1_5 | 0.741 | 0.917 | | | 1.320 | 6.760 |
|
||||
| MobileNetV2_x2_0 | 0.752 | 0.926 | | | 2.320 | 11.130 |
|
||||
| MobileNetV2_ssld | 0.7674 | 0.9339 | | | 0.600 | 3.440 |
|
||||
| MobileNetV3_large_<br>x1_25 | 0.764 | 0.930 | 0.766 | | 0.714 | 7.440 |
|
||||
| MobileNetV3_large_<br>x1_0 | 0.753 | 0.923 | 0.752 | | 0.450 | 5.470 |
|
||||
| MobileNetV3_large_<br>x0_75 | 0.731 | 0.911 | 0.733 | | 0.296 | 3.910 |
|
||||
| MobileNetV3_large_<br>x0_5 | 0.692 | 0.885 | 0.688 | | 0.138 | 2.670 |
|
||||
| MobileNetV3_large_<br>x0_35 | 0.643 | 0.855 | 0.642 | | 0.077 | 2.100 |
|
||||
| MobileNetV3_small_<br>x1_25 | 0.707 | 0.895 | 0.704 | | 0.195 | 3.620 |
|
||||
| MobileNetV3_small_<br>x1_0 | 0.682 | 0.881 | 0.675 | | 0.123 | 2.940 |
|
||||
| MobileNetV3_small_<br>x0_75 | 0.660 | 0.863 | 0.654 | | 0.088 | 2.370 |
|
||||
| MobileNetV3_small_<br>x0_5 | 0.592 | 0.815 | 0.580 | | 0.043 | 1.900 |
|
||||
| MobileNetV3_small_<br>x0_35 | 0.530 | 0.764 | 0.498 | | 0.026 | 1.660 |
|
||||
| MobileNetV3_small_<br>x0_35_ssld | 0.556 | 0.777 | 0.498 | | 0.026 | 1.660 |
|
||||
| MobileNetV3_large_<br>x1_0_ssld | 0.790 | 0.945 | | | 0.450 | 5.470 |
|
||||
| MobileNetV3_large_<br>x1_0_ssld_int8 | 0.761 | | | | | |
|
||||
| MobileNetV3_small_<br>x1_0_ssld | 0.713 | 0.901 | | | 0.123 | 2.940 |
|
||||
| ShuffleNetV2 | 0.688 | 0.885 | 0.694 | | 0.280 | 2.260 |
|
||||
| ShuffleNetV2_x0_25 | 0.499 | 0.738 | | | 0.030 | 0.600 |
|
||||
| ShuffleNetV2_x0_33 | 0.537 | 0.771 | | | 0.040 | 0.640 |
|
||||
| ShuffleNetV2_x0_5 | 0.603 | 0.823 | 0.603 | | 0.080 | 1.360 |
|
||||
| ShuffleNetV2_x1_5 | 0.716 | 0.902 | 0.726 | | 0.580 | 3.470 |
|
||||
| ShuffleNetV2_x2_0 | 0.732 | 0.912 | 0.749 | | 1.120 | 7.320 |
|
||||
| ShuffleNetV2_swish | 0.700 | 0.892 | | | 0.290 | 2.260 |
|
||||
| GhostNet_x0_5 | 0.668 | 0.869 | 0.662 | 0.866 | 0.082 | 2.600 |
|
||||
| GhostNet_x1_0 | 0.740 | 0.916 | 0.739 | 0.914 | 0.294 | 5.200 |
|
||||
| GhostNet_x1_3 | 0.757 | 0.925 | 0.757 | 0.927 | 0.440 | 7.300 |
|
||||
| GhostNet_x1_3_ssld | 0.794 | 0.945 | 0.757 | 0.927 | 0.440 | 7.300 |
|
||||
|
||||
<a name='3'></a>
|
||||
## 3. Inference speed and storage size based on SD855
|
||||
|
||||
| Models | Batch Size=1(ms) | Storage Size(M) |
|
||||
|:--:|:--:|:--:|
|
||||
| MobileNetV1_x0_25 | 3.220 | 1.900 |
|
||||
| MobileNetV1_x0_5 | 9.580 | 5.200 |
|
||||
| MobileNetV1_x0_75 | 19.436 | 10.000 |
|
||||
| MobileNetV1 | 32.523 | 16.000 |
|
||||
| MobileNetV1_ssld | 32.523 | 16.000 |
|
||||
| MobileNetV2_x0_25 | 3.799 | 6.100 |
|
||||
| MobileNetV2_x0_5 | 8.702 | 7.800 |
|
||||
| MobileNetV2_x0_75 | 15.531 | 10.000 |
|
||||
| MobileNetV2 | 23.318 | 14.000 |
|
||||
| MobileNetV2_x1_5 | 45.624 | 26.000 |
|
||||
| MobileNetV2_x2_0 | 74.292 | 43.000 |
|
||||
| MobileNetV2_ssld | 23.318 | 14.000 |
|
||||
| MobileNetV3_large_x1_25 | 28.218 | 29.000 |
|
||||
| MobileNetV3_large_x1_0 | 19.308 | 21.000 |
|
||||
| MobileNetV3_large_x0_75 | 13.565 | 16.000 |
|
||||
| MobileNetV3_large_x0_5 | 7.493 | 11.000 |
|
||||
| MobileNetV3_large_x0_35 | 5.137 | 8.600 |
|
||||
| MobileNetV3_small_x1_25 | 9.275 | 14.000 |
|
||||
| MobileNetV3_small_x1_0 | 6.546 | 12.000 |
|
||||
| MobileNetV3_small_x0_75 | 5.284 | 9.600 |
|
||||
| MobileNetV3_small_x0_5 | 3.352 | 7.800 |
|
||||
| MobileNetV3_small_x0_35 | 2.635 | 6.900 |
|
||||
| MobileNetV3_small_x0_35_ssld | 2.635 | 6.900 |
|
||||
| MobileNetV3_large_x1_0_ssld | 19.308 | 21.000 |
|
||||
| MobileNetV3_large_x1_0_ssld_int8 | 14.395 | 10.000 |
|
||||
| MobileNetV3_small_x1_0_ssld | 6.546 | 12.000 |
|
||||
| ShuffleNetV2 | 10.941 | 9.000 |
|
||||
| ShuffleNetV2_x0_25 | 2.329 | 2.700 |
|
||||
| ShuffleNetV2_x0_33 | 2.643 | 2.800 |
|
||||
| ShuffleNetV2_x0_5 | 4.261 | 5.600 |
|
||||
| ShuffleNetV2_x1_5 | 19.352 | 14.000 |
|
||||
| ShuffleNetV2_x2_0 | 34.770 | 28.000 |
|
||||
| ShuffleNetV2_swish | 16.023 | 9.100 |
|
||||
| GhostNet_x0_5 | 5.714 | 10.000 |
|
||||
| GhostNet_x1_0 | 13.558 | 20.000 |
|
||||
| GhostNet_x1_3 | 19.982 | 29.000 |
|
||||
| GhostNet_x1_3_ssld | 19.982 | 29.000 |
|
||||
|
||||
<a name='4'></a>
|
||||
## 4. Inference speed based on T4 GPU
|
||||
|
||||
| Models | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|
||||
|-----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
|
||||
| MobileNetV1_x0_25 | 0.68422 | 1.13021 | 1.72095 | 0.67274 | 1.226 | 1.84096 |
|
||||
| MobileNetV1_x0_5 | 0.69326 | 1.09027 | 1.84746 | 0.69947 | 1.43045 | 2.39353 |
|
||||
| MobileNetV1_x0_75 | 0.6793 | 1.29524 | 2.15495 | 0.79844 | 1.86205 | 3.064 |
|
||||
| MobileNetV1 | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 |
|
||||
| MobileNetV1_ssld | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 |
|
||||
| MobileNetV2_x0_25 | 2.85399 | 3.62405 | 4.29952 | 2.81989 | 3.52695 | 4.2432 |
|
||||
| MobileNetV2_x0_5 | 2.84258 | 3.1511 | 4.10267 | 2.80264 | 3.65284 | 4.31737 |
|
||||
| MobileNetV2_x0_75 | 2.82183 | 3.27622 | 4.98161 | 2.86538 | 3.55198 | 5.10678 |
|
||||
| MobileNetV2 | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 |
|
||||
| MobileNetV2_x1_5 | 2.81852 | 4.87434 | 8.97934 | 2.79398 | 5.30149 | 9.30899 |
|
||||
| MobileNetV2_x2_0 | 3.65197 | 6.32329 | 11.644 | 3.29788 | 7.08644 | 12.45375 |
|
||||
| MobileNetV2_ssld | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 |
|
||||
| MobileNetV3_large_x1_25 | 2.34387 | 3.16103 | 4.79742 | 2.35117 | 3.44903 | 5.45658 |
|
||||
| MobileNetV3_large_x1_0 | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 |
|
||||
| MobileNetV3_large_x0_75 | 2.1058 | 2.61426 | 3.61021 | 2.0006 | 2.56987 | 3.78005 |
|
||||
| MobileNetV3_large_x0_5 | 2.06934 | 2.77341 | 3.35313 | 2.11199 | 2.88172 | 3.19029 |
|
||||
| MobileNetV3_large_x0_35 | 2.14965 | 2.7868 | 3.36145 | 1.9041 | 2.62951 | 3.26036 |
|
||||
| MobileNetV3_small_x1_25 | 2.06817 | 2.90193 | 3.5245 | 2.02916 | 2.91866 | 3.34528 |
|
||||
| MobileNetV3_small_x1_0 | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 |
|
||||
| MobileNetV3_small_x0_75 | 1.80617 | 2.64646 | 3.24513 | 1.93697 | 2.64285 | 3.32797 |
|
||||
| MobileNetV3_small_x0_5 | 1.95001 | 2.74014 | 3.39485 | 1.88406 | 2.99601 | 3.3908 |
|
||||
| MobileNetV3_small_x0_35 | 2.10683 | 2.94267 | 3.44254 | 1.94427 | 2.94116 | 3.41082 |
|
||||
| MobileNetV3_small_x0_35_ssld | 2.10683 | 2.94267 | 3.44254 | 1.94427 | 2.94116 | 3.41082 |
|
||||
| MobileNetV3_large_x1_0_ssld | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 |
|
||||
| MobileNetV3_small_x1_0_ssld | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 |
|
||||
| ShuffleNetV2 | 1.95064 | 2.15928 | 2.97169 | 1.89436 | 2.26339 | 3.17615 |
|
||||
| ShuffleNetV2_x0_25 | 1.43242 | 2.38172 | 2.96768 | 1.48698 | 2.29085 | 2.90284 |
|
||||
| ShuffleNetV2_x0_33 | 1.69008 | 2.65706 | 2.97373 | 1.75526 | 2.85557 | 3.09688 |
|
||||
| ShuffleNetV2_x0_5 | 1.48073 | 2.28174 | 2.85436 | 1.59055 | 2.18708 | 3.09141 |
|
||||
| ShuffleNetV2_x1_5 | 1.51054 | 2.4565 | 3.41738 | 1.45389 | 2.5203 | 3.99872 |
|
||||
| ShuffleNetV2_x2_0 | 1.95616 | 2.44751 | 4.19173 | 2.15654 | 3.18247 | 5.46893 |
|
||||
| ShuffleNetV2_swish | 2.50213 | 2.92881 | 3.474 | 2.5129 | 2.97422 | 3.69357 |
|
||||
| GhostNet_x0_5 | 2.64492 | 3.48473 | 4.48844 | 2.36115 | 3.52802 | 3.89444 |
|
||||
| GhostNet_x1_0 | 2.63120 | 3.92065 | 4.48296 | 2.57042 | 3.56296 | 4.85524 |
|
||||
| GhostNet_x1_3 | 2.89715 | 3.80329 | 4.81661 | 2.81810 | 3.72071 | 5.92269 |
|
@ -0,0 +1,64 @@
|
||||
# Other networks
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
* [3. Inference speed and storage size based on SD855](#3)
|
||||
* [4. Inference speed based on T4 GPU](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
In 2012, AlexNet network proposed by Alex et al. won the ImageNet competition by far surpassing the second place, and the convolutional neural network and even deep learning attracted wide attention. AlexNet used relu as the activation function of CNN to solve the gradient dispersion problem of sigmoid when the network is deep. During the training, Dropout was used to randomly lose a part of the neurons, avoiding the overfitting of the model. In the network, overlapping maximum pooling is used to replace the average pooling commonly used in CNN, which avoids the fuzzy effect of average pooling and improves the feature richness. In a sense, AlexNet has exploded the research and application of neural networks.
|
||||
|
||||
SqueezeNet achieved the same precision as AlexNet on Imagenet-1k, but only with 1/50 parameters. The core of the network is the Fire module, which used the convolution of 1x1 to achieve channel dimensionality reduction, thus greatly saving the number of parameters. The author created SqueezeNet by stacking a large number of Fire modules.
|
||||
|
||||
VGG is a convolutional neural network developed by researchers at Oxford University's Visual Geometry Group and DeepMind. The network explores the relationship between the depth of the convolutional neural network and its performance. By repeatedly stacking the small convolutional kernel of 3x3 and the maximum pooling layer of 2x2, the multi-layer convolutional neural network is successfully constructed and has achieved good convergence accuracy. In the end, VGG won the runner-up of ILSVRC 2014 classification and the champion of positioning.
|
||||
|
||||
DarkNet53 is designed for object detection by YOLO author in the paper. The network is basically composed of 1x1 and 3x3 kernel, with a total of 53 layers, named DarkNet53.
|
||||
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| AlexNet | 0.567 | 0.792 | 0.5720 | | 1.370 | 61.090 |
|
||||
| SqueezeNet1_0 | 0.596 | 0.817 | 0.575 | | 1.550 | 1.240 |
|
||||
| SqueezeNet1_1 | 0.601 | 0.819 | | | 0.690 | 1.230 |
|
||||
| VGG11 | 0.693 | 0.891 | | | 15.090 | 132.850 |
|
||||
| VGG13 | 0.700 | 0.894 | | | 22.480 | 133.030 |
|
||||
| VGG16 | 0.720 | 0.907 | 0.715 | 0.901 | 30.810 | 138.340 |
|
||||
| VGG19 | 0.726 | 0.909 | | | 39.130 | 143.650 |
|
||||
| DarkNet53 | 0.780 | 0.941 | 0.772 | 0.938 | 18.580 | 41.600 |
|
||||
|
||||
|
||||
<a name='3'></a>
|
||||
## 3. Inference speed based on V100 GPU
|
||||
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|
||||
|---------------------------|-----------|-------------------|----------------------|
|
||||
| AlexNet | 224 | 256 | 1.176 |
|
||||
| SqueezeNet1_0 | 224 | 256 | 0.860 |
|
||||
| SqueezeNet1_1 | 224 | 256 | 0.763 |
|
||||
| VGG11 | 224 | 256 | 1.867 |
|
||||
| VGG13 | 224 | 256 | 2.148 |
|
||||
| VGG16 | 224 | 256 | 2.616 |
|
||||
| VGG19 | 224 | 256 | 3.076 |
|
||||
| DarkNet53 | 256 | 256 | 3.139 |
|
||||
|
||||
<a name='4'></a>
|
||||
## 4. Inference speed based on T4 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|
||||
|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
|
||||
| AlexNet | 224 | 256 | 1.06447 | 1.70435 | 2.38402 | 1.44993 | 2.46696 | 3.72085 |
|
||||
| SqueezeNet1_0 | 224 | 256 | 0.97162 | 2.06719 | 3.67499 | 0.96736 | 2.53221 | 4.54047 |
|
||||
| SqueezeNet1_1 | 224 | 256 | 0.81378 | 1.62919 | 2.68044 | 0.76032 | 1.877 | 3.15298 |
|
||||
| VGG11 | 224 | 256 | 2.24408 | 4.67794 | 7.6568 | 3.90412 | 9.51147 | 17.14168 |
|
||||
| VGG13 | 224 | 256 | 2.58589 | 5.82708 | 10.03591 | 4.64684 | 12.61558 | 23.70015 |
|
||||
| VGG16 | 224 | 256 | 3.13237 | 7.19257 | 12.50913 | 5.61769 | 16.40064 | 32.03939 |
|
||||
| VGG19 | 224 | 256 | 3.69987 | 8.59168 | 15.07866 | 6.65221 | 20.4334 | 41.55902 |
|
||||
| DarkNet53 | 256 | 256 | 3.18101 | 5.88419 | 10.14964 | 4.10829 | 12.1714 | 22.15266 |
|
@ -0,0 +1,26 @@
|
||||
# PVTV2
|
||||
|
||||
---
|
||||
|
||||
## Content
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
PVTV2 is VisionTransformer series model, which build on PVT (Pyramid Vision Transformer). PVT use Transformer block to build feature pyramid network. The mainly designs of PVTV2 are: (1) overlapping patch embedding, (2) convolutional feedforward networks, and (3) linear complexity attention layers. [Paper](https://arxiv.org/pdf/2106.13797.pdf).
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Params<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| PVT_V2_B0 | 0.705 | 0.902 | 0.705 | - | 0.53 | 3.7 |
|
||||
| PVT_V2_B1 | 0.787 | 0.945 | 0.787 | - | 2.0 | 14.0 |
|
||||
| PVT_V2_B2 | 0.821 | 0.960 | 0.820 | - | 3.9 | 25.4 |
|
||||
| PVT_V2_B3 | 0.831 | 0.965 | 0.831 | - | 6.7 | 45.2 |
|
||||
| PVT_V2_B4 | 0.836 | 0.967 | 0.836 | - | 9.8 | 62.6 |
|
||||
| PVT_V2_B5 | 0.837 | 0.966 | 0.838 | - | 11.4 | 82.0 |
|
||||
| PVT_V2_B2_Linear | 0.821 | 0.961 | 0.821 | - | 3.8 | 22.6 |
|
@ -0,0 +1,22 @@
|
||||
# RedNet series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
In the backbone of ResNet and in all bottleneck positions of backbone, the convolution is replaced by Involution, but all convolutions are reserved for channel mapping and fusion. These carefully redesigned entities combine to form a new efficient backbone network, called Rednet. [paper](https://arxiv.org/abs/2103.06255).
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|
||||
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
|
||||
| RedNet26 | 9.2 | 1.7 | 75.95 | 93.19 |
|
||||
| RedNet38 | 12.4 | 2.2 | 77.47 | 93.56 |
|
||||
| RedNet50 | 15.5 | 2.7 | 78.33 | 94.17 |
|
||||
| RedNet101 | 25.7 | 4.7 | 78.94 | 94.36 |
|
||||
| RedNet152 | 34.0 | 6.8 | 79.17 | 94.40 |
|
@ -0,0 +1,29 @@
|
||||
# RepVGG series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
RepVGG (Making VGG-style ConvNets Great Again) series model is a simple but powerful convolutional neural network architecture proposed by Tsinghua University (Guiguang Ding's team), MEGVII Technology (Jian Sun et al.), HKUST and Aberystwyth University in 2021. The architecture has an inference time agent similar to VGG. The main body is composed of 3x3 convolution and relu stack, while the training time model has multi branch topology. The decoupling of training time and inference time is realized by re-parameterization technology, so the model is called repvgg. [paper](https://arxiv.org/abs/2101.03697).
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1| FLOPs<br>(G) |
|
||||
|:--:|:--:|:--:|:--:|:--:|
|
||||
| RepVGG_A0 | 0.7131 | 0.9016 | 0.7241 | |
|
||||
| RepVGG_A1 | 0.7380 | 0.9146 | 0.7446 | |
|
||||
| RepVGG_A2 | 0.7571 | 0.9264 | 0.7648 | |
|
||||
| RepVGG_B0 | 0.7450 | 0.9213 | 0.7514 | |
|
||||
| RepVGG_B1 | 0.7773 | 0.9385 | 0.7837 | |
|
||||
| RepVGG_B2 | 0.7813 | 0.9410 | 0.7878 | |
|
||||
| RepVGG_B1g2 | 0.7732 | 0.9359 | 0.7778 | |
|
||||
| RepVGG_B1g4 | 0.7675 | 0.9335 | 0.7758 | |
|
||||
| RepVGG_B2g4 | 0.7881 | 0.9448 | 0.7938 | |
|
||||
| RepVGG_B3g4 | 0.7965 | 0.9485 | 0.8021 | |
|
||||
|
||||
Params, FLOPs, Inference speed and other information are coming soon.
|
@ -0,0 +1,32 @@
|
||||
# ResNeSt and RegNet series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
* [3. Inference speed based on T4 GPU](#3)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
The ResNeSt series was proposed in 2020. The original resnet network structure has been improved by introducing K groups and adding an attention module similar to SEBlock in different groups, the accuracy is greater than that of the basic model ResNet, but the parameter amount and flops are almost the same as the basic ResNet.
|
||||
|
||||
RegNet was proposed in 2020 by Facebook to deepen the concept of design space. Based on AnyNetX, the model performance is gradually improved by shared bottleneck ratio, shared group width, adjusting network depth or width and other strategies. What's more, the design space structure is simplified, whose interpretability is also be improved. The quality of design space is improved while its diversity is maintained. Under similar conditions, the performance of the designed RegNet model performs better than EfficientNet and 5 times faster than EfficientNet.
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| ResNeSt50_fast_1s1x64d | 0.8035 | 0.9528| 0.8035 | -| 8.68 | 26.3 |
|
||||
| ResNeSt50 | 0.8083 | 0.9542| 0.8113 | -| 10.78 | 27.5 |
|
||||
| RegNetX_4GF | 0.7850 | 0.9416| 0.7860 | -| 8.0 | 22.1 |
|
||||
|
||||
<a name='3'></a>
|
||||
## 3. Inference speed based on T4 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|
||||
|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
|
||||
| ResNeSt50_fast_1s1x64d | 224 | 256 | 3.46466 | 5.56647 | 9.11848 | 3.45405 | 8.72680 | 15.48710 |
|
||||
| ResNeSt50 | 224 | 256 | 7.05851 | 8.97676 | 13.34704 | 6.16248 | 12.0633 | 21.49936 |
|
||||
| RegNetX_4GF | 224 | 256 | 6.69042 | 8.01664 | 11.60608 | 6.46478 | 11.19862 | 16.89089 |
|
@ -0,0 +1,104 @@
|
||||
# ResNet and ResNet_vd series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
* [3. Inference speed based on V100 GPU](#3)
|
||||
* [4. Inference speed based on T4 GPU](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
The ResNet series model was proposed in 2015 and won the championship in the ILSVRC2015 competition with a top5 error rate of 3.57%. The network innovatively proposed the residual structure, and built the ResNet network by stacking multiple residual structures. Experiments show that using residual blocks can improve the convergence speed and accuracy effectively.
|
||||
|
||||
Joyce Xu of Stanford university calls ResNet one of three architectures that "really redefine the way we think about neural networks." Due to the outstanding performance of ResNet, more and more scholars and engineers from academia and industry have improved its structure. The well-known ones include wide-resnet, resnet-vc, resnet-vd, Res2Net, etc. The number of parameters and FLOPs of resnet-vc and resnet-vd are almost the same as those of ResNet, so we hereby unified them into the ResNet series.
|
||||
|
||||
The models of the ResNet series released this time include 14 pre-trained models including ResNet50, ResNet50_vd, ResNet50_vd_ssld, and ResNet200_vd. At the training level, ResNet adopted the standard training process for training ImageNet, while the rest of the improved model adopted more training strategies, such as cosine decay for the decline of learning rate and the regular label smoothing method,mixup was added to the data preprocessing, and the total number of iterations increased from 120 epoches to 200 epoches.
|
||||
|
||||
Among them, ResNet50_vd_v2 and ResNet50_vd_ssld adopted knowledge distillation, which further improved the accuracy of the model while keeping the structure unchanged. Specifically, the teacher model of ResNet50_vd_v2 is ResNet152_vd (top1 accuracy 80.59%), the training set is imagenet-1k, the teacher model of ResNet50_vd_ssld is ResNeXt101_32x16d_wsl (top1 accuracy 84.2%), and the training set is the combination of 4 million data mined by imagenet-22k and ImageNet-1k . The specific methods of knowledge distillation are being continuously updated.
|
||||
|
||||
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
|
||||
As can be seen from the above curves, the higher the number of layers, the higher the accuracy, but the corresponding number of parameters, calculation and latency will increase. ResNet50_vd_ssld further improves the accuracy of top-1 of the ImageNet-1k validation set by using stronger teachers and more data, reaching 82.39%, refreshing the accuracy of ResNet50 series models.
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| ResNet18 | 0.710 | 0.899 | 0.696 | 0.891 | 3.660 | 11.690 |
|
||||
| ResNet18_vd | 0.723 | 0.908 | | | 4.140 | 11.710 |
|
||||
| ResNet34 | 0.746 | 0.921 | 0.732 | 0.913 | 7.360 | 21.800 |
|
||||
| ResNet34_vd | 0.760 | 0.930 | | | 7.390 | 21.820 |
|
||||
| ResNet34_vd_ssld | 0.797 | 0.949 | | | 7.390 | 21.820 |
|
||||
| ResNet50 | 0.765 | 0.930 | 0.760 | 0.930 | 8.190 | 25.560 |
|
||||
| ResNet50_vc | 0.784 | 0.940 | | | 8.670 | 25.580 |
|
||||
| ResNet50_vd | 0.791 | 0.944 | 0.792 | 0.946 | 8.670 | 25.580 |
|
||||
| ResNet50_vd_v2 | 0.798 | 0.949 | | | 8.670 | 25.580 |
|
||||
| ResNet101 | 0.776 | 0.936 | 0.776 | 0.938 | 15.520 | 44.550 |
|
||||
| ResNet101_vd | 0.802 | 0.950 | | | 16.100 | 44.570 |
|
||||
| ResNet152 | 0.783 | 0.940 | 0.778 | 0.938 | 23.050 | 60.190 |
|
||||
| ResNet152_vd | 0.806 | 0.953 | | | 23.530 | 60.210 |
|
||||
| ResNet200_vd | 0.809 | 0.953 | | | 30.530 | 74.740 |
|
||||
| ResNet50_vd_ssld | 0.824 | 0.961 | | | 8.670 | 25.580 |
|
||||
| ResNet50_vd_ssld_v2 | 0.830 | 0.964 | | | 8.670 | 25.580 |
|
||||
| Fix_ResNet50_vd_ssld_v2 | 0.840 | 0.970 | | | 17.696 | 25.580 |
|
||||
| ResNet101_vd_ssld | 0.837 | 0.967 | | | 16.100 | 44.570 |
|
||||
|
||||
* Note: `ResNet50_vd_ssld_v2` is obtained by adding AutoAugment in training process on the basis of `ResNet50_vd_ssld` training strategy.`Fix_ResNet50_vd_ssld_v2` stopped all parameter updates of `ResNet50_vd_ssld_v2` except the FC layer,and fine-tuned on ImageNet1k dataset, the resolution is 320x320.
|
||||
|
||||
<a name='3'></a>
|
||||
## 3. Inference speed based on V100 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|
||||
|------------------|-----------|-------------------|--------------------------|
|
||||
| ResNet18 | 224 | 256 | 1.499 |
|
||||
| ResNet18_vd | 224 | 256 | 1.603 |
|
||||
| ResNet34 | 224 | 256 | 2.272 |
|
||||
| ResNet34_vd | 224 | 256 | 2.343 |
|
||||
| ResNet34_vd_ssld | 224 | 256 | 2.343 |
|
||||
| ResNet50 | 224 | 256 | 2.939 |
|
||||
| ResNet50_vc | 224 | 256 | 3.041 |
|
||||
| ResNet50_vd | 224 | 256 | 3.165 |
|
||||
| ResNet50_vd_v2 | 224 | 256 | 3.165 |
|
||||
| ResNet101 | 224 | 256 | 5.314 |
|
||||
| ResNet101_vd | 224 | 256 | 5.252 |
|
||||
| ResNet152 | 224 | 256 | 7.205 |
|
||||
| ResNet152_vd | 224 | 256 | 7.200 |
|
||||
| ResNet200_vd | 224 | 256 | 8.885 |
|
||||
| ResNet50_vd_ssld | 224 | 256 | 3.165 |
|
||||
| ResNet101_vd_ssld | 224 | 256 | 5.252 |
|
||||
|
||||
<a name='4'></a>
|
||||
## 4. Inference speed based on T4 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|
||||
|-------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
|
||||
| ResNet18 | 224 | 256 | 1.3568 | 2.5225 | 3.61904 | 1.45606 | 3.56305 | 6.28798 |
|
||||
| ResNet18_vd | 224 | 256 | 1.39593 | 2.69063 | 3.88267 | 1.54557 | 3.85363 | 6.88121 |
|
||||
| ResNet34 | 224 | 256 | 2.23092 | 4.10205 | 5.54904 | 2.34957 | 5.89821 | 10.73451 |
|
||||
| ResNet34_vd | 224 | 256 | 2.23992 | 4.22246 | 5.79534 | 2.43427 | 6.22257 | 11.44906 |
|
||||
| ResNet34_vd | 224 | 256 | 2.23992 | 4.22246 | 5.79534 | 2.43427 | 6.22257 | 11.44906 |
|
||||
| ResNet50 | 224 | 256 | 2.63824 | 4.63802 | 7.02444 | 3.47712 | 7.84421 | 13.90633 |
|
||||
| ResNet50_vc | 224 | 256 | 2.67064 | 4.72372 | 7.17204 | 3.52346 | 8.10725 | 14.45577 |
|
||||
| ResNet50_vd | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
|
||||
| ResNet50_vd_v2 | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
|
||||
| ResNet101 | 224 | 256 | 5.04037 | 7.73673 | 10.8936 | 6.07125 | 13.40573 | 24.3597 |
|
||||
| ResNet101_vd | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 |
|
||||
| ResNet152 | 224 | 256 | 7.28665 | 10.62001 | 14.90317 | 8.50198 | 19.17073 | 35.78384 |
|
||||
| ResNet152_vd | 224 | 256 | 7.29127 | 10.86137 | 15.32444 | 8.54376 | 19.52157 | 36.64445 |
|
||||
| ResNet200_vd | 224 | 256 | 9.36026 | 13.5474 | 19.0725 | 10.80619 | 25.01731 | 48.81399 |
|
||||
| ResNet50_vd_ssld | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
|
||||
| ResNet50_vd_ssld_v2 | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
|
||||
| Fix_ResNet50_vd_ssld_v2 | 320 | 320 | 3.42818 | 7.51534 | 13.19370 | 5.07696 | 14.64218 | 27.01453 |
|
||||
| ResNet101_vd_ssld | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 |
|
@ -0,0 +1,126 @@
|
||||
# SEResNeXt and Res2Net series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
* [3. Inference speed based on V100 GPU](#3)
|
||||
* [4. Inference speed based on T4 GPU](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
ResNeXt, one of the typical variants of ResNet, was presented at the CVPR conference in 2017. Prior to this, the methods to improve the model accuracy mainly focused on deepening or widening the network, which increased the number of parameters and calculation, and slowed down the inference speed accordingly. The concept of cardinality was proposed in ResNeXt structure. The author found that increasing the number of channel groups was more effective than increasing the depth and width through experiments. It can improve the accuracy without increasing the parameter complexity and reduce the number of parameters at the same time, so it is a more successful variant of ResNet.
|
||||
|
||||
SENet is the winner of the 2017 ImageNet classification competition. It proposes a new SE structure that can be migrated to any other network. It controls the scale to enhance the important features between each channel, and weaken the unimportant features. So that the extracted features are more directional.
|
||||
|
||||
Res2Net is a brand-new improvement of ResNet proposed in 2019. The solution can be easily integrated with other excellent modules. Without increasing the amount of calculation, the performance on ImageNet, CIFAR-100 and other data sets exceeds ResNet. Res2Net, with its simple structure and superior performance, further explores the multi-scale representation capability of CNN at a more fine-grained level. Res2Net reveals a new dimension to improve model accuracy, called scale, which is an essential and more effective factor in addition to the existing dimensions of depth, width, and cardinality. The network also performs well in other visual tasks such as object detection and image segmentation.
|
||||
|
||||
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
|
||||
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
|
||||
At present, there are a total of 24 pretrained models of the three categories open sourced by PaddleClas, and the indicators are shown in the figure. It can be seen from the diagram that under the same Flops and Params, the improved model tends to have higher accuracy, but the inference speed is often inferior to the ResNet series. On the other hand, Res2Net performed better. Compared with group operation in ResNeXt and SE structure operation in SEResNet, Res2Net tended to have better accuracy in the same Flops, Params and inference speed.
|
||||
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| Res2Net50_26w_4s | 0.793 | 0.946 | 0.780 | 0.936 | 8.520 | 25.700 |
|
||||
| Res2Net50_vd_26w_4s | 0.798 | 0.949 | | | 8.370 | 25.060 |
|
||||
| Res2Net50_vd_26w_4s_ssld | 0.831 | 0.966 | | | 8.370 | 25.060 |
|
||||
| Res2Net50_14w_8s | 0.795 | 0.947 | 0.781 | 0.939 | 9.010 | 25.720 |
|
||||
| Res2Net101_vd_26w_4s | 0.806 | 0.952 | | | 16.670 | 45.220 |
|
||||
| Res2Net101_vd_26w_4s_ssld | 0.839 | 0.971 | | | 16.670 | 45.220 |
|
||||
| Res2Net200_vd_26w_4s | 0.812 | 0.957 | | | 31.490 | 76.210 |
|
||||
| Res2Net200_vd_26w_4s_ssld | **0.851** | 0.974 | | | 31.490 | 76.210 |
|
||||
| ResNeXt50_32x4d | 0.778 | 0.938 | 0.778 | | 8.020 | 23.640 |
|
||||
| ResNeXt50_vd_32x4d | 0.796 | 0.946 | | | 8.500 | 23.660 |
|
||||
| ResNeXt50_64x4d | 0.784 | 0.941 | | | 15.060 | 42.360 |
|
||||
| ResNeXt50_vd_64x4d | 0.801 | 0.949 | | | 15.540 | 42.380 |
|
||||
| ResNeXt101_32x4d | 0.787 | 0.942 | 0.788 | | 15.010 | 41.540 |
|
||||
| ResNeXt101_vd_32x4d | 0.803 | 0.951 | | | 15.490 | 41.560 |
|
||||
| ResNeXt101_64x4d | 0.784 | 0.945 | 0.796 | | 29.050 | 78.120 |
|
||||
| ResNeXt101_vd_64x4d | 0.808 | 0.952 | | | 29.530 | 78.140 |
|
||||
| ResNeXt152_32x4d | 0.790 | 0.943 | | | 22.010 | 56.280 |
|
||||
| ResNeXt152_vd_32x4d | 0.807 | 0.952 | | | 22.490 | 56.300 |
|
||||
| ResNeXt152_64x4d | 0.795 | 0.947 | | | 43.030 | 107.570 |
|
||||
| ResNeXt152_vd_64x4d | 0.811 | 0.953 | | | 43.520 | 107.590 |
|
||||
| SE_ResNet18_vd | 0.733 | 0.914 | | | 4.140 | 11.800 |
|
||||
| SE_ResNet34_vd | 0.765 | 0.932 | | | 7.840 | 21.980 |
|
||||
| SE_ResNet50_vd | 0.795 | 0.948 | | | 8.670 | 28.090 |
|
||||
| SE_ResNeXt50_32x4d | 0.784 | 0.940 | 0.789 | 0.945 | 8.020 | 26.160 |
|
||||
| SE_ResNeXt50_vd_32x4d | 0.802 | 0.949 | | | 10.760 | 26.280 |
|
||||
| SE_ResNeXt101_32x4d | 0.7939 | 0.9443 | 0.793 | 0.950 | 15.020 | 46.280 |
|
||||
| SENet154_vd | 0.814 | 0.955 | | | 45.830 | 114.290 |
|
||||
|
||||
|
||||
<a name='3'></a>
|
||||
## 3. Inference speed based on V100 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|
||||
|-----------------------|-----------|-------------------|--------------------------|
|
||||
| Res2Net50_26w_4s | 224 | 256 | 4.148 |
|
||||
| Res2Net50_vd_26w_4s | 224 | 256 | 4.172 |
|
||||
| Res2Net50_14w_8s | 224 | 256 | 5.113 |
|
||||
| Res2Net101_vd_26w_4s | 224 | 256 | 7.327 |
|
||||
| Res2Net200_vd_26w_4s | 224 | 256 | 12.806 |
|
||||
| ResNeXt50_32x4d | 224 | 256 | 10.964 |
|
||||
| ResNeXt50_vd_32x4d | 224 | 256 | 7.566 |
|
||||
| ResNeXt50_64x4d | 224 | 256 | 13.905 |
|
||||
| ResNeXt50_vd_64x4d | 224 | 256 | 14.321 |
|
||||
| ResNeXt101_32x4d | 224 | 256 | 14.915 |
|
||||
| ResNeXt101_vd_32x4d | 224 | 256 | 14.885 |
|
||||
| ResNeXt101_64x4d | 224 | 256 | 28.716 |
|
||||
| ResNeXt101_vd_64x4d | 224 | 256 | 28.398 |
|
||||
| ResNeXt152_32x4d | 224 | 256 | 22.996 |
|
||||
| ResNeXt152_vd_32x4d | 224 | 256 | 22.729 |
|
||||
| ResNeXt152_64x4d | 224 | 256 | 46.705 |
|
||||
| ResNeXt152_vd_64x4d | 224 | 256 | 46.395 |
|
||||
| SE_ResNet18_vd | 224 | 256 | 1.694 |
|
||||
| SE_ResNet34_vd | 224 | 256 | 2.786 |
|
||||
| SE_ResNet50_vd | 224 | 256 | 3.749 |
|
||||
| SE_ResNeXt50_32x4d | 224 | 256 | 8.924 |
|
||||
| SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.011 |
|
||||
| SE_ResNeXt101_32x4d | 224 | 256 | 19.204 |
|
||||
| SENet154_vd | 224 | 256 | 50.406 |
|
||||
|
||||
<a name='4'></a>
|
||||
## 4. Inference speed based on T4 GPU
|
||||
|
||||
| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|
||||
|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
|
||||
| Res2Net50_26w_4s | 224 | 256 | 3.56067 | 6.61827 | 11.41566 | 4.47188 | 9.65722 | 17.54535 |
|
||||
| Res2Net50_vd_26w_4s | 224 | 256 | 3.69221 | 6.94419 | 11.92441 | 4.52712 | 9.93247 | 18.16928 |
|
||||
| Res2Net50_14w_8s | 224 | 256 | 4.45745 | 7.69847 | 12.30935 | 5.4026 | 10.60273 | 18.01234 |
|
||||
| Res2Net101_vd_26w_4s | 224 | 256 | 6.53122 | 10.81895 | 18.94395 | 8.08729 | 17.31208 | 31.95762 |
|
||||
| Res2Net200_vd_26w_4s | 224 | 256 | 11.66671 | 18.93953 | 33.19188 | 14.67806 | 32.35032 | 63.65899 |
|
||||
| ResNeXt50_32x4d | 224 | 256 | 7.61087 | 8.88918 | 12.99674 | 7.56327 | 10.6134 | 18.46915 |
|
||||
| ResNeXt50_vd_32x4d | 224 | 256 | 7.69065 | 8.94014 | 13.4088 | 7.62044 | 11.03385 | 19.15339 |
|
||||
| ResNeXt50_64x4d | 224 | 256 | 13.78688 | 15.84655 | 21.79537 | 13.80962 | 18.4712 | 33.49843 |
|
||||
| ResNeXt50_vd_64x4d | 224 | 256 | 13.79538 | 15.22201 | 22.27045 | 13.94449 | 18.88759 | 34.28889 |
|
||||
| ResNeXt101_32x4d | 224 | 256 | 16.59777 | 17.93153 | 21.36541 | 16.21503 | 19.96568 | 33.76831 |
|
||||
| ResNeXt101_vd_32x4d | 224 | 256 | 16.36909 | 17.45681 | 22.10216 | 16.28103 | 20.25611 | 34.37152 |
|
||||
| ResNeXt101_64x4d | 224 | 256 | 30.12355 | 32.46823 | 38.41901 | 30.4788 | 36.29801 | 68.85559 |
|
||||
| ResNeXt101_vd_64x4d | 224 | 256 | 30.34022 | 32.27869 | 38.72523 | 30.40456 | 36.77324 | 69.66021 |
|
||||
| ResNeXt152_32x4d | 224 | 256 | 25.26417 | 26.57001 | 30.67834 | 24.86299 | 29.36764 | 52.09426 |
|
||||
| ResNeXt152_vd_32x4d | 224 | 256 | 25.11196 | 26.70515 | 31.72636 | 25.03258 | 30.08987 | 52.64429 |
|
||||
| ResNeXt152_64x4d | 224 | 256 | 46.58293 | 48.34563 | 56.97961 | 46.7564 | 56.34108 | 106.11736 |
|
||||
| ResNeXt152_vd_64x4d | 224 | 256 | 47.68447 | 48.91406 | 57.29329 | 47.18638 | 57.16257 | 107.26288 |
|
||||
| SE_ResNet18_vd | 224 | 256 | 1.61823 | 3.1391 | 4.60282 | 1.7691 | 4.19877 | 7.5331 |
|
||||
| SE_ResNet34_vd | 224 | 256 | 2.67518 | 5.04694 | 7.18946 | 2.88559 | 7.03291 | 12.73502 |
|
||||
| SE_ResNet50_vd | 224 | 256 | 3.65394 | 7.568 | 12.52793 | 4.28393 | 10.38846 | 18.33154 |
|
||||
| SE_ResNeXt50_32x4d | 224 | 256 | 9.06957 | 11.37898 | 18.86282 | 8.74121 | 13.563 | 23.01954 |
|
||||
| SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.25016 | 11.85045 | 25.57004 | 9.17134 | 14.76192 | 19.914 |
|
||||
| SE_ResNeXt101_32x4d | 224 | 256 | 19.34455 | 20.6104 | 32.20432 | 18.82604 | 25.31814 | 41.97758 |
|
||||
| SENet154_vd | 224 | 256 | 49.85733 | 54.37267 | 74.70447 | 53.79794 | 66.31684 | 121.59885 |
|
@ -0,0 +1,28 @@
|
||||
# SwinTransformer
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
Swin Transformer a new vision Transformer, that capably serves as a general-purpose backbone for computer vision. It is a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. [Paper](https://arxiv.org/abs/2103.14030)。
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| SwinTransformer_tiny_patch4_window7_224 | 0.8069 | 0.9534 | 0.812 | 0.955 | 4.5 | 28 |
|
||||
| SwinTransformer_small_patch4_window7_224 | 0.8275 | 0.9613 | 0.832 | 0.962 | 8.7 | 50 |
|
||||
| SwinTransformer_base_patch4_window7_224 | 0.8300 | 0.9626 | 0.835 | 0.965 | 15.4 | 88 |
|
||||
| SwinTransformer_base_patch4_window12_384 | 0.8439 | 0.9693 | 0.845 | 0.970 | 47.1 | 88 |
|
||||
| SwinTransformer_base_patch4_window7_224<sup>[1]</sup> | 0.8487 | 0.9746 | 0.852 | 0.975 | 15.4 | 88 |
|
||||
| SwinTransformer_base_patch4_window12_384<sup>[1]</sup> | 0.8642 | 0.9807 | 0.864 | 0.980 | 47.1 | 88 |
|
||||
| SwinTransformer_large_patch4_window7_224<sup>[1]</sup> | 0.8596 | 0.9783 | 0.863 | 0.979 | 34.5 | 197 |
|
||||
| SwinTransformer_large_patch4_window12_384<sup>[1]</sup> | 0.8719 | 0.9823 | 0.873 | 0.982 | 103.9 | 197 |
|
||||
|
||||
[1]: Based on imagenet22k dataset pre-training, and then in imagenet1k dataset transfer learning.
|
||||
|
||||
**Note**: The difference of precision with reference from the difference of data preprocessing.
|
@ -0,0 +1,19 @@
|
||||
# TNT series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
TNT(Transformer-iN-Transformer) series models were proposed by Huawei-Noah in 2021 for modeling both patch-level and pixel-level representation. In each TNT block, an outer transformer block is utilized to process patch embeddings, and an inner transformer block extracts local features from pixel embeddings. The pixel-level feature is projected to the space of patch embedding by a linear transformation layer and then added into the patch. By stacking the TNT blocks, we build the TNT model for image recognition. Experiments on ImageNet benchmark and downstream tasks demonstrate the superiority and efficiency of the proposed TNT architecture. For example, our TNT achieves 81.3% top-1 accuracy on ImageNet which is 1.5% higher than that of DeiT with similar computational cost. [Paper](https://arxiv.org/abs/2103.00112).
|
||||
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|
||||
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
|
||||
| TNT_small | 23.8 | 5.2 | 81.12 | 95.56 |
|
@ -0,0 +1,24 @@
|
||||
# Twins
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
The Twins network includes Twins-PCPVT and Twins-SVT, which focuses on the meticulous design of the spatial attention mechanism, resulting in a simple but more effective solution. Since the architecture only involves matrix multiplication, and the current deep learning framework has a high degree of optimization for matrix multiplication, the architecture is very efficient and easy to implement. Moreover, this architecture can achieve excellent performance in a variety of downstream vision tasks such as image classification, target detection, and semantic segmentation. [Paper](https://arxiv.org/abs/2104.13840).
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| pcpvt_small | 0.8082 | 0.9552 | 0.812 | - | 3.7 | 24.1 |
|
||||
| pcpvt_base | 0.8242 | 0.9619 | 0.827 | - | 6.4 | 43.8 |
|
||||
| pcpvt_large | 0.8273 | 0.9650 | 0.831 | - | 9.5 | 60.9 |
|
||||
| alt_gvt_small | 0.8140 | 0.9546 | 0.817 | - | 2.8 | 24 |
|
||||
| alt_gvt_base | 0.8294 | 0.9621 | 0.832 | - | 8.3 | 56 |
|
||||
| alt_gvt_large | 0.8331 | 0.9642 | 0.837 | - | 14.8 | 99.2 |
|
||||
|
||||
**Note**:The difference in accuracy from Reference is due to the difference in data preprocessing.
|
@ -0,0 +1,41 @@
|
||||
# ViT and DeiT series
|
||||
---
|
||||
## Catalogue
|
||||
|
||||
* [1. Overview](#1)
|
||||
* [2. Accuracy, FLOPs and Parameters](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Overview
|
||||
|
||||
ViT(Vision Transformer) series models were proposed by Google in 2020. These models only use the standard transformer structure, completely abandon the convolution structure, splits the image into multiple patches and then inputs them into the transformer, showing the potential of transformer in the CV field.。[Paper](https://arxiv.org/abs/2010.11929)。
|
||||
|
||||
DeiT(Data-efficient Image Transformers) series models were proposed by Facebook at the end of 2020. Aiming at the problem that the ViT models need large-scale dataset training, the DeiT improved them, and finally achieved 83.1% Top1 accuracy on ImageNet. More importantly, using convolution model as teacher model, and performing knowledge distillation on these models, the Top1 accuracy of 85.2% can be achieved on the ImageNet dataset.
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Accuracy, FLOPs and Parameters
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| ViT_small_patch16_224 | 0.7769 | 0.9342 | 0.7785 | 0.9342 | | |
|
||||
| ViT_base_patch16_224 | 0.8195 | 0.9617 | 0.8178 | 0.9613 | | |
|
||||
| ViT_base_patch16_384 | 0.8414 | 0.9717 | 0.8420 | 0.9722 | | |
|
||||
| ViT_base_patch32_384 | 0.8176 | 0.9613 | 0.8166 | 0.9613 | | |
|
||||
| ViT_large_patch16_224 | 0.8323 | 0.9650 | 0.8306 | 0.9644 | | |
|
||||
| ViT_large_patch16_384 | 0.8513 | 0.9736 | 0.8517 | 0.9736 | | |
|
||||
| ViT_large_patch32_384 | 0.8153 | 0.9608 | 0.815 | - | | |
|
||||
|
||||
|
||||
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Params<br>(M) |
|
||||
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
|
||||
| DeiT_tiny_patch16_224 | 0.718 | 0.910 | 0.722 | 0.911 | | |
|
||||
| DeiT_small_patch16_224 | 0.796 | 0.949 | 0.799 | 0.950 | | |
|
||||
| DeiT_base_patch16_224 | 0.817 | 0.957 | 0.818 | 0.956 | | |
|
||||
| DeiT_base_patch16_384 | 0.830 | 0.962 | 0.829 | 0.972 | | |
|
||||
| DeiT_tiny_distilled_patch16_224 | 0.741 | 0.918 | 0.745 | 0.919 | | |
|
||||
| DeiT_small_distilled_patch16_224 | 0.809 | 0.953 | 0.812 | 0.954 | | |
|
||||
| DeiT_base_distilled_patch16_224 | 0.831 | 0.964 | 0.834 | 0.965 | | |
|
||||
| DeiT_base_distilled_patch16_384 | 0.851 | 0.973 | 0.852 | 0.972 | | |
|
||||
|
||||
|
||||
Params, FLOPs, Inference speed and other information are coming soon.
|
@ -0,0 +1,30 @@
|
||||
models
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
DPN_DenseNet_en.md
|
||||
models_intro_en.md
|
||||
RepVGG_en.md
|
||||
EfficientNet_and_ResNeXt101_wsl_en.md
|
||||
ViT_and_DeiT_en.md
|
||||
SwinTransformer_en.md
|
||||
Others_en.md
|
||||
SEResNext_and_Res2Net_en.md
|
||||
ESNet_en.md
|
||||
HRNet_en.md
|
||||
ReXNet_en.md
|
||||
Inception_en.md
|
||||
TNT_en.md
|
||||
RedNet_en.md
|
||||
DLA_en.md
|
||||
ResNeSt_RegNet_en.md
|
||||
PP-LCNet_en.md
|
||||
HarDNet_en.md
|
||||
ResNet_and_vd_en.md
|
||||
LeViT_en.md
|
||||
Mobile_en.md
|
||||
MixNet_en.md
|
||||
Twins_en.md
|
||||
PVTV2_en.md
|
@ -0,0 +1,10 @@
|
||||
models_training
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
config_description_en.md
|
||||
recognition_en.md
|
||||
classification_en.md
|
||||
train_strategy_en.md
|
@ -0,0 +1,122 @@
|
||||
# Tricks for Training
|
||||
|
||||
## Catalogue
|
||||
|
||||
- [1. Choice of Optimizers](#1)
|
||||
- [2. Choice of Learning Rate and Learning Rate Declining Strategy](#2)
|
||||
- [2.1 Concept of Learning Rate](#2.1)
|
||||
- [2.2 Learning Rate Decline Strategy](#2.2)
|
||||
- [2.3 Warmup Strategy](#2.3)
|
||||
- [3. Choice of Batch_size](#3)
|
||||
- [4. Choice of Weight_decay](#4)
|
||||
- [5. Choice of Label_smoothing](#5)
|
||||
- [6. Change the Crop Area and Stretch Transformation Degree of the Images for Small Models](#6)
|
||||
- [7. Use Data Augmentation to Improve Accuracy](#7)
|
||||
- [8. Determine the Tuning Strategy by Train_acc and Test_acc](#8)
|
||||
- [9. Improve the Accuracy of Your Own Data Set with Existing Pre-trained Models](#9)
|
||||
|
||||
<a name="1"></a>
|
||||
## 1. Choice of Optimizers
|
||||
Since the development of deep learning, there have been many researchers working on the optimizer. The purpose of the optimizer is to make the loss function as small as possible, so as to find suitable parameters to complete a certain task. At present, the main optimizers used in model training are SGD, RMSProp, Adam, AdaDelt and so on. The SGD optimizers with momentum is widely used in academia and industry, so most of models we release are trained by SGD optimizer with momentum. But the SGD optimizer with momentum has two disadvantages, one is that the convergence speed is slow, the other is that the initial learning rate is difficult to set, however, if the initial learning rate is set properly and the models are trained in sufficient iterations, the models trained by SGD with momentum can reach higher accuracy compared with the models trained by other optimizers. Some other optimizers with adaptive learning rate such as Adam, RMSProp and so on tent to converge faster, but the final convergence accuracy will be slightly worse. If you want to train a model in faster convergence speed, we recommend you use the optimizers with adaptive learning rate, but if you want to train a model with higher accuracy, we recommend you to use SGD optimizer with momentum.
|
||||
|
||||
<a name="2"></a>
|
||||
## 2. Choice of Learning Rate and Learning Rate Declining Strategy
|
||||
The choice of learning rate is related to the optimizer, data set and tasks. Here we mainly introduce the learning rate of training ImageNet-1K with momentum + SGD as the optimizer and the choice of learning rate decline.
|
||||
|
||||
<a name="2.1"></a>
|
||||
### 2.1 Concept of Learning Rate
|
||||
the learning rate is the hyperparameter to control the learning speed, the lower the learning rate, the slower the change of the loss value, though using a low learning rate can ensure that you will not miss any local minimum, but it also means that the convergence speed is slow, especially when the gradient is trapped in a gradient plateau area.
|
||||
|
||||
<a name="2.2"></a>
|
||||
### 2.2 Learning Rate Decline Strategy
|
||||
During training, if we always use the same learning rate, we cannot get the model with highest accuracy, so the learning rate should be adjust during training. In the early stage of training, the weights are in a random initialization state and the gradients are tended to descent, so we can set a relatively large learning rate for faster convergence. In the late stage of training, the weights are close to the optimal values, the optimal value cannot be reached by a relatively large learning rate, so a relatively smaller learning rate should be used. During training, many researchers use the piecewise_decay learning rate reduction strategy, which is a stepwise decline learning rate. For example, in the training of ResNet50, the initial learning rate we set is 0.1, and the learning rate drops to 1/10 every 30 epoches, the total epoches for training is 120. Besides the piecewise_decay, many researchers also proposed other ways to decrease the learning rate, such as polynomial_decay, exponential_decay and cosine_decay and so on, among them, cosine_decay has become the preferred learning rate reduction method for improving model accuracy beacause there is no need to adjust hyperparameters and the robustness is relatively high. The learning rate curves of cosine_decay and piecewise_decay are shown in the following figures, it is easy to observe that during the entire training process, cosine_decay keeps a relatively large learning rate, so its convergence is slower, but the final convergence accuracy is better than the one using piecewise_decay.
|
||||
|
||||

|
||||
|
||||
In addition, we can also see from the figures that the number of epoches with a small learning rate in cosine_decay is fewer, which will affect the final accuracy, so in order to make cosine_decay play a better effect, it is recommended to use cosine_decay in large epoched, such as 200 epoches.
|
||||
|
||||
<a name="2.3"></a>
|
||||
### 2.3 Warmup Strategy
|
||||
If a large batch_size is adopted to train nerual network, we recommend you to adopt warmup strategy. as the name suggests, the warmup strategy is to let model learning first warm up, we do not directly use the initial learning rate at the begining of training, instead, we use a gradually increasing learning rate to train the model, when the increasing learning rate reaches the initial learning rate, the learning rate reduction method mentioned in the learning rate reduction strategy is then used to decay the learning rate. Experiments show that when the batch size is large, warmup strategy can improve the accuracy. Some model training with large batch_size such as MobileNetV3 training, we set the epoch in warmup to 5 by default, that is, first in 5 epoches, the learning rate increases from 0 to initial learning rate, then learning rate decay begins.
|
||||
|
||||
<a name="3"></a>
|
||||
## 3. Choice of Batch_size
|
||||
Batch_size is an important hyperparameter in training neural networks, batch_size determines how much data is sent to the neural network to for training at a time. In the paper [1], the author found in experiments that when batch_size is linearly related to the learning rate, the convergence accuracy is hardly affected. When training ImageNet data, an initial learning rate of 0.1 are commonly chosen for training, and batch_size is 256, so according to the actual model size and memory, you can set the learning rate to 0.1\*k, batch_size to 256\*k.
|
||||
|
||||
<a name="4"></a>
|
||||
## 4. Choice of Weight_decay
|
||||
Overfitting is a common term in machine learning. A simple understanding is that the model performs well on the training data, but it performs poorly on the test data. In the convolutional neural network, there also exists the problem of overfitting. To avoid overfitting, many regular ways have been proposed. Among them, weight_decay is one of the widely used ways to avoid overfitting. After the final loss function, L2 regularization(weight_decay) is added to the loss function, with the help of L2 regularization, the weight of the network tend to choose a smaller value, and finally the parameters in the entire network tends to 0, and the generalization performance of the model is improved accordingly. In different kinds of Deep learning frame, the meaning of L2_decay is the coefficient of L2 regularization, on paddle, the name of this value is L2_decay, so in the following the value is called L2_decay. the larger the coefficient, the more the model tends to be underfitting. In the task of training ImageNet, this parameter is set to 1e-4 in most network. In some small networks such as MobileNet networks, in order to avoid network underfitting, the value is set to 1e-5 ~ 4e-5. Of course, the setting of this value is also related to the specific data set, When the data set is large, the network itself tends to be under-fitted, and the value can be appropriately reduced. When the data set is small, the network tends to overfit itself, so the value can be increased appropriately. The following table shows the accuracy of MobileNetV1_x0_25 using different l2_decay on ImageNet-1k. Since MobileNetV1_x0_25 is a relatively small network, the large l2_decay will make the network tend to be underfitting, so in this network, 3e-5 are better choices compared with 1e-4.
|
||||
|
||||
| Model | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
|
||||
|:--:|:--:|:--:|:--:|
|
||||
| MobileNetV1_x0_25 | 1e-4 | 43.79%/67.61% | 50.41%/74.70% |
|
||||
| MobileNetV1_x0_25 | 3e-5 | 47.38%/70.83% | 51.45%/75.45% |
|
||||
|
||||
In addition, the setting of L2_decay is also related to whether other regularization is used during training. If the data argument during the training is more complicated, which means that the training becomes more difficult, L2_decay can be appropriately reduced. The following table shows that the precision of ResNet50 using a different l2_decay on ImageNet-1K. It is easy to observe that after the training becomes difficult, using a smaller l2_decay helps to improve the accuracy of the model.
|
||||
|
||||
| Model | L2_decay | Train acc1/acc5 | Test acc1/acc5 |
|
||||
|:--:|:--:|:--:|:--:|
|
||||
| ResNet50 | 1e-4 | 75.13%/90.42% | 77.65%/93.79% |
|
||||
| ResNet50 | 7e-5 | 75.56%/90.55% | 78.04%/93.74% |
|
||||
|
||||
In summary, l2_decay can be adjusted according to specific tasks and models. Usually simple tasks or larger models are recommended to use Larger l2_decay, complex tasks or smaller models are recommended to use smaller l2_decay.
|
||||
|
||||
<a name="5"></a>
|
||||
## 5. Choice of Label_smoothing
|
||||
Label_smoothing is a regularization method in deep learning. Its full name is Label Smoothing Regularization (LSR), that is, label smoothing regularization. In the traditional classification task, when calculating the loss function, the real one hot label and the output of the neural network are calculated in cross-entropy formula, the label smoothing aims to make the real one hot label become smooth label, which makes the neural network no longer learn from the hard labels, but the soft labels with a probability value, where the probability of the position corresponding to the category is the largest and the probability of other positions are very small value, specific calculation method can be seen in the paper[2]. In label-smoothing, there is an epsilon parameter describing the degree of softening the label. The larger epsilon, the smaller the probability and smoother the label, on the contrary, the label tends to be hard label. during training on ImageNet-1K, the parameter is usually set to 0.1. In the experiments of training ResNet50, when using label_smoothing, the accuracy is higher than the one without label_smoothing, the following table shows the performance of ResNet50_vd with label smoothing and without label smoothing.
|
||||
|
||||
| Model | Use_label_smoothing | Test acc1 |
|
||||
|:--:|:--:|:--:|
|
||||
| ResNet50_vd | 0 | 77.9% |
|
||||
| ResNet50_vd | 1 | 78.4% |
|
||||
|
||||
But, because label smoothing can be regarded as a regular way, on relatively small models, the accuracy improvement is not obvious or even decreases, the following table shows the accuracy performance of ResNet18 with label smoothing and without label smoothing on ImageNet-1K, it can be clearly seen that after using label smoothing, the accuracy of ResNet has decreased.
|
||||
|
||||
| Model | Use_label_smoohing | Train acc1/acc5 | Test acc1/acc5 |
|
||||
|:--:|:--:|:--:|:--:|
|
||||
| ResNet18 | 0 | 69.81%/87.70% | 70.98%/89.92% |
|
||||
| ResNet18 | 1 | 68.00%/86.56% | 70.81%/89.89% |
|
||||
|
||||
|
||||
In summary, the use of label_smoohing for larger models can effectively improve the accuracy of the model, and the use of label_smoohing for smaller models may reduce the accuracy of the model, so before deciding whether to use label_smoohing, you need to evaluate the size of the model and the difficulty of the task.
|
||||
|
||||
<a name="6"></a>
|
||||
## 6. Change the Crop Area and Stretch Transformation Degree of the Images for Small Models
|
||||
In the standard preprocessing of ImageNet-1k data, two values of scale and ratio are defined in the random_crop function. These two values respectively determine the size of the image crop and the degree of stretching of the image. The default value of scale is 0.08-1(lower_scale-upper_scale), the default value range of ratio is 3/4-4/3(lower_ratio-upper_ratio). In small network training, such data argument will make the network underfitting, resulting in a decrease in accuracy. In order to improve the accuracy of the network, you can make the data argument weaker, that is, increase the crop area of the images or weaken the degree of stretching and transformation of the images, we can achieve weaker image transformation by increasing the value of lower_scale or narrowing the gap between lower_ratio and upper_scale. The following table lists the accuracy of training MobileNetV2_x0_25 with different lower_scale. It can be seen that the training accuracy and validation accuracy are improved after increasing the crop area of the images
|
||||
|
||||
| Model | Scale Range | Train_acc1/acc5 | Test_acc1/acc5 |
|
||||
|:--:|:--:|:--:|:--:|
|
||||
| MobileNetV2_x0_25 | [0.08,1] | 50.36%/72.98% | 52.35%/75.65% |
|
||||
| MobileNetV2_x0_25 | [0.2,1] | 54.39%/77.08% | 53.18%/76.14% |
|
||||
|
||||
<a name="7"></a>
|
||||
## 7. Use Data Augmentation to Improve Accuracy
|
||||
In general, the size of the data set is critical to the performances, but the annotation of images are often more expensive, so the number of annotated images are often scarce. In this case, the data argument is particularly important. In the standard data augmentation for training on ImageNet-1k, two data augmentation methods which are random_crop and random_flip are mainly used. However, in recent years, more and more data augmentation methods have been proposed, such as cutout, mixup, cutmix, AutoAugment, etc. Experiments show that these data augmentation methods can effectively improve the accuracy of the model. The following table lists the performance of ResNet50 in 8 different data augmentation methods. It can be seen that compared to the baseline, all data augmentation methods can be useful for the accuracy of ResNet50, among them cutmix is currently the most effective data argument. More data argument can be seen here[**Data Argument**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/image_augmentation/ImageAugment.html).
|
||||
|
||||
| Model | Data Argument | Test top-1 |
|
||||
|:--:|:--:|:--:|
|
||||
| ResNet50 | Baseline | 77.31% |
|
||||
| ResNet50 | Auto-Augment | 77.95% |
|
||||
| ResNet50 | Mixup | 78.28% |
|
||||
| ResNet50 | Cutmix | 78.39% |
|
||||
| ResNet50 | Cutout | 78.01% |
|
||||
| ResNet50 | Gridmask | 77.85% |
|
||||
| ResNet50 | Random-Augment | 77.70% |
|
||||
| ResNet50 | Random-Erasing | 77.91% |
|
||||
| ResNet50 | Hide-and-Seek | 77.43% |
|
||||
|
||||
<a name="8"></a>
|
||||
## 8. Determine the Tuning Strategy by Train_acc and Test_acc
|
||||
In the process of training the network, the training set accuracy rate and validation set accuracy rate of each epoch are usually printed. Generally speaking, the accuracy of the training set is slightly higher than the accuracy of the validation set or the same are good state in training, but if you find that the accuracy of training set is much higher than the one of validation set, it means that overfitting happens in your task, which need more regularization, such as increase the value of L2_decay, using more data argument or label smoothing and so on. If you find that the accuracy of training set is lower than the one of validation set, it means that underfitting happens in your task, which recommend you to decrease the value of L2_decay, using fewer data argument, increase the area of the crop area of the images, weaken the stretching transformation of the images, remove label_smoothing, etc.
|
||||
|
||||
<a name="9"></a>
|
||||
## 9. Improve the Accuracy of Your Own Data Set with Existing Pre-trained Models
|
||||
In the field of computer vision, it has become common to load pre-trained models to train one's own tasks. Compared with starting training from random initialization, loading pre-trained models can often improve the accuracy of specific tasks. In general, the pre-trained model widely used in the industry is obtained from the ImageNet-1k dataset. The fc layer weight of the pre-trained model is a matrix of k\*1000, where k is The number of neurons before, and the weights of the fc layer is not need to load because of the different tasks. In terms of learning rate, if your training data set is particularly small (such as less than 1,000), we recommend that you use a smaller initial learning rate, such as 0.001 (batch_size: 256, the same below), to avoid a large learning rate undermine pre-training weights, if your training data set is relatively large (greater than 100,000), we recommend that you try a larger initial learning rate, such as 0.01 or greater.
|
||||
|
||||
|
||||
> If you think this guide is helpful to you, welcome to star our repo:[https://github.com/PaddlePaddle/PaddleClas](https://github.com/PaddlePaddle/PaddleClas)
|
||||
|
||||
## Reference
|
||||
[1]P. Goyal, P. Dolla ́r, R. B. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017.
|
||||
|
||||
[2]C.Szegedy,V.Vanhoucke,S.Ioffe,J.Shlens,andZ.Wojna. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.
|
@ -0,0 +1,21 @@
|
||||
### Competition Support
|
||||
|
||||
PaddleClas stems from the Baidu's visual business applications and the exploration of frontier visual capabilities. It has helped us achieve leading results in many key events, and continues to promote more frontier visual solutions and landing applications.
|
||||
|
||||
|
||||
* 1st place in 2018 Kaggle Open Images V4 object detection challenge
|
||||
|
||||
|
||||
* 2nd place in 2019 Kaggle Open Images V5 object detection challenge
|
||||
* The report is avaiable here: [https://arxiv.org/pdf/1911.07171.pdf](https://arxiv.org/pdf/1911.07171.pdf)
|
||||
* The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/featured_model/OIDV5_BASELINE_MODEL.md)
|
||||
|
||||
* 2nd place in Kacggle Landmark Retrieval Challenge 2019
|
||||
* The report is avaiable here: [https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
|
||||
* The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
|
||||
|
||||
* 2nd place in Kaggle Landmark Recognition Challenge 2019
|
||||
* The report is avaiable here: [https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
|
||||
* The pretrained model and code is avaiable here: [source code](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
|
||||
|
||||
* A-level certificate of three tasks: printed text OCR, face recognition and landmark recognition in the first multimedia information recognition technology competition
|
@ -0,0 +1,15 @@
|
||||
others
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
transfer_learning_en.md
|
||||
train_with_DALI_en.md
|
||||
VisualDL_en.md
|
||||
train_on_xpu_en.md
|
||||
feature_visiualization_en.md
|
||||
paddle_mobile_inference_en.md
|
||||
competition_support_en.md
|
||||
update_history_en.md
|
||||
versions_en.md
|
@ -0,0 +1,130 @@
|
||||
# Benchmark on Mobile
|
||||
|
||||
---
|
||||
|
||||
## Catalogue
|
||||
|
||||
* [1. Introduction](#1)
|
||||
* [2. Evaluation Steps](#2)
|
||||
* [2.1 Export the Inference Model](#2.1)
|
||||
* [2.2 Download Benchmark Binary File](#2.2)
|
||||
* [2.3 Inference benchmark](#2.3)
|
||||
* [2.4 Model Optimization and Speed Evaluation](#2.4)
|
||||
|
||||
<a name='1'></a>
|
||||
## 1. Introduction
|
||||
|
||||
[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) is a set of lightweight inference engine which is fully functional, easy to use and then performs well. Lightweighting is reflected in the use of fewer bits to represent the weight and activation of the neural network, which can greatly reduce the size of the model, solve the problem of limited storage space of the mobile device, and the inference speed is better than other frameworks on the whole.
|
||||
|
||||
In [PaddleClas](https://github.com/PaddlePaddle/PaddleClas), we uses Paddle-Lite to [evaluate the performance on the mobile device](../models/Mobile_en.md), in this section we uses the `MobileNetV1` model trained on the `ImageNet1k` dataset as an example to introduce how to use `Paddle-Lite` to evaluate the model speed on the mobile terminal (evaluated on SD855)
|
||||
|
||||
<a name='2'></a>
|
||||
## 2. Evaluation Steps
|
||||
|
||||
<a name='2.1'></a>
|
||||
### 2.1 Export the Inference Model
|
||||
|
||||
* First you should transform the saved model during training to the special model which can be used to inference, the special model can be exported by `tools/export_model.py`, the specific way of transform is as follows.
|
||||
|
||||
```shell
|
||||
python tools/export_model.py -m MobileNetV1 -p pretrained/MobileNetV1_pretrained/ -o inference/MobileNetV1
|
||||
```
|
||||
|
||||
Finally the `model` and `parmas` can be saved in `inference/MobileNetV1`.
|
||||
|
||||
<a name='2.2'></a>
|
||||
### 2.2 Download Benchmark Binary File
|
||||
|
||||
* Use the adb (Android Debug Bridge) tool to connect the Android phone and the PC, then develop and debug. After installing adb and ensuring that the PC and the phone are successfully connected, use the following command to view the ARM version of the phone and select the pre-compiled library based on ARM version.
|
||||
|
||||
```shell
|
||||
adb shell getprop ro.product.cpu.abi
|
||||
```
|
||||
|
||||
* Download Benchmark_bin File
|
||||
|
||||
```shell
|
||||
wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v8
|
||||
```
|
||||
|
||||
If the ARM version is v7, the v7 benchmark_bin file should be downloaded, the command is as follow.
|
||||
|
||||
```shell
|
||||
wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v7
|
||||
```
|
||||
|
||||
<a name='2.3'></a>
|
||||
### 2.3 Inference benchmark
|
||||
|
||||
After the PC and mobile phone are successfully connected, use the following command to start the model evaluation.
|
||||
|
||||
```
|
||||
sh deploy/lite/benchmark/benchmark.sh ./benchmark_bin_v8 ./inference result_armv8.txt true
|
||||
```
|
||||
|
||||
Where `./benchmark_bin_v8` is the path of the benchmark binary file, `./inference` is the path of all the models that need to be evaluated, `result_armv8.txt` is the result file, and the final parameter `true` means that the model will be optimized before evaluation. Eventually, the evaluation result file of `result_armv8.txt` will be saved in the current folder. The specific performances are as follows.
|
||||
|
||||
```
|
||||
PaddleLite Benchmark
|
||||
Threads=1 Warmup=10 Repeats=30
|
||||
MobileNetV1 min = 30.89100 max = 30.73600 average = 30.79750
|
||||
|
||||
Threads=2 Warmup=10 Repeats=30
|
||||
MobileNetV1 min = 18.26600 max = 18.14000 average = 18.21637
|
||||
|
||||
Threads=4 Warmup=10 Repeats=30
|
||||
MobileNetV1 min = 10.03200 max = 9.94300 average = 9.97627
|
||||
```
|
||||
|
||||
Here is the model inference speed under different number of threads, the unit is FPS, taking model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.79750FPS`.
|
||||
|
||||
<a name='2.4'></a>
|
||||
### 2.4 Model Optimization and Speed Evaluation
|
||||
|
||||
* In II.III section, we mention that the model will be optimized before evaluation, here you can first optimize the model, and then directly load the optimized model for speed evaluation
|
||||
|
||||
* Paddle-Lite
|
||||
In Paddle-Lite, we provides multiple strategies to automatically optimize the original training model, which contain Quantify, Subgraph fusion, Hybrid scheduling, Kernel optimization and so on. In order to make the optimization more convenient and easy to use, we provide opt tools to automatically complete the optimization steps and output a lightweight, optimal and executable model in Paddle-Lite, which can be downloaded on [Paddle-Lite Model Optimization Page](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html). Here we take `MacOS` as our development environment, download[opt_mac](https://paddlelite-data.bj.bcebos.com/model_optimize_tool/opt_mac) model optimization tools and use the following commands to optimize the model.
|
||||
|
||||
|
||||
```shell
|
||||
model_file="../MobileNetV1/model"
|
||||
param_file="../MobileNetV1/params"
|
||||
opt_models_dir="./opt_models"
|
||||
mkdir ${opt_models_dir}
|
||||
./opt_mac --model_file=${model_file} \
|
||||
--param_file=${param_file} \
|
||||
--valid_targets=arm \
|
||||
--optimize_out_type=naive_buffer \
|
||||
--prefer_int8_kernel=false \
|
||||
--optimize_out=${opt_models_dir}/MobileNetV1
|
||||
```
|
||||
|
||||
Where the `model_file` and `param_file` are exported model file and the file address respectively, after transforming successfully, the `MobileNetV1.nb` will be saved in `opt_models`
|
||||
|
||||
|
||||
|
||||
Use the benchmark_bin file to load the optimized model for evaluation. The commands are as follows.
|
||||
|
||||
```shell
|
||||
bash benchmark.sh ./benchmark_bin_v8 ./opt_models result_armv8.txt
|
||||
```
|
||||
|
||||
Finally the result is saved in `result_armv8.txt` and shown as follow.
|
||||
|
||||
```
|
||||
PaddleLite Benchmark
|
||||
Threads=1 Warmup=10 Repeats=30
|
||||
MobileNetV1_lite min = 30.89500 max = 30.78500 average = 30.84173
|
||||
|
||||
Threads=2 Warmup=10 Repeats=30
|
||||
MobileNetV1_lite min = 18.25300 max = 18.11000 average = 18.18017
|
||||
|
||||
Threads=4 Warmup=10 Repeats=30
|
||||
MobileNetV1_lite min = 10.00600 max = 9.90000 average = 9.96177
|
||||
```
|
||||
|
||||
|
||||
Taking the model on one threads as an example, the average speed of MobileNetV1 on SD855 is `30.84173FPS`.
|
||||
|
||||
More specific parameter explanation and Paddle-Lite usage can refer to [Paddle-Lite docs](https://paddle-lite.readthedocs.io/zh/latest/)。
|
@ -0,0 +1,79 @@
|
||||
# Train with DALI
|
||||
|
||||
---
|
||||
|
||||
## Catalogue
|
||||
|
||||
* [1. Preface](#1)
|
||||
* [2. Installing DALI](#2)
|
||||
* [3. Using DALI](#3)
|
||||
* [4. Train with FP16](#4)
|
||||
|
||||
<a name='1'></a>
|
||||
|
||||
## 1. Preface
|
||||
|
||||
[The NVIDIA Data Loading Library](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html) (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It can build Dataloader of PaddlePaddle.
|
||||
|
||||
Since the Deep Learning relies on a large amount of data in the training stage, these data need to be loaded and preprocessed. These operations are usually executed on the CPU, which limits the further improvement of the training speed, especially when the batch_size is large, which become the bottleneck of training speed. DALI can use GPU to accelerate these operations, thereby further improve the training speed.
|
||||
|
||||
<a name='2'></a>
|
||||
|
||||
## 2. Installing DALI
|
||||
|
||||
DALI only support Linux x64 and version of CUDA is 10.2 or later.
|
||||
|
||||
* For CUDA 10:
|
||||
|
||||
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda100
|
||||
|
||||
* For CUDA 11.0:
|
||||
|
||||
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110
|
||||
|
||||
For more information about installing DALI, please refer to [DALI](https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html).
|
||||
|
||||
<a name='3'></a>
|
||||
|
||||
## 3. Using DALI
|
||||
|
||||
Paddleclas supports training with DALI. Since DALI only supports GPU training, `CUDA_VISIBLE_DEVICES` needs to be set, and DALI needs to occupy GPU memory, so it needs to reserve GPU memory for Dali. To train with DALI, just set the fields in the training config `use_dali = True`, or start the training by the following command:
|
||||
|
||||
```shell
|
||||
# set the GPUs that can be seen
|
||||
export CUDA_VISIBLE_DEVICES="0"
|
||||
|
||||
python ppcls/train.py -c ppcls/configs/ImageNet/ResNet/ResNet50.yaml -o Global.use_dali=True
|
||||
```
|
||||
|
||||
And you can train with muti-GPUs:
|
||||
|
||||
```shell
|
||||
# set the GPUs that can be seen
|
||||
export CUDA_VISIBLE_DEVICES="0,1,2,3"
|
||||
|
||||
# set the GPU memory used for neural network training, generally 0.8 or 0.7, and the remaining GPU memory is reserved for DALI
|
||||
export FLAGS_fraction_of_gpu_memory_to_use=0.80
|
||||
|
||||
python -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
ppcls/train.py \
|
||||
-c ./ppcls/configs/ImageNet/ResNet/ResNet50.yaml \
|
||||
-o Global.use_dali=True
|
||||
```
|
||||
|
||||
<a name='4'></a>
|
||||
|
||||
## 4. Train with FP16
|
||||
|
||||
On the basis of the above, using FP16 half-precision can further improve the training speed, you can refer to the following command.
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
export FLAGS_fraction_of_gpu_memory_to_use=0.8
|
||||
|
||||
python -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
ppcls/train.py \
|
||||
-c ./ppcls/configs/ImageNet/ResNet/ResNet50_fp16_dygraph.yaml
|
||||
```
|
@ -0,0 +1,55 @@
|
||||
# Release Notes
|
||||
|
||||
- 2021.04.15
|
||||
- Add `MixNet` and `ReXNet` pretrained models, `MixNet_L`'s Top-1 Acc on ImageNet-1k reaches 78.6% and `ReXNet_3_0` reaches 82.09%.
|
||||
- 2021.01.27
|
||||
* Add ViT and DeiT pretrained models, ViT's Top-1 Acc on ImageNet reaches 81.05%, and DeiT reaches 85.5%.
|
||||
- 2021.01.08
|
||||
* Add support for whl package and its usage, Model inference can be done by simply install paddleclas using pip.
|
||||
- 2020.12.16
|
||||
* Add support for TensorRT when using cpp inference to obain more obvious acceleration.
|
||||
- 2020.12.06
|
||||
* Add `SE_HRNet_W64_C_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 84.75%.
|
||||
- 2020.11.23
|
||||
* Add `GhostNet_x1_3_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 79.38%.
|
||||
- 2020.11.09
|
||||
* Add `InceptionV3` architecture and pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 79.1%.
|
||||
|
||||
* 2020.10.20
|
||||
* Add `Res2Net50_vd_26w_4s_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 83.1%.
|
||||
* Add `Res2Net101_vd_26w_4s_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 83.9%.
|
||||
|
||||
- 2020.10.12
|
||||
* Add Paddle-Lite demo.
|
||||
|
||||
- 2020.10.10
|
||||
* Add cpp inference demo.
|
||||
* Improve FAQ tutorials.
|
||||
|
||||
* 2020.09.17
|
||||
* Add `HRNet_W48_C_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 83.62%.
|
||||
* Add `ResNet34_vd_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 79.72%.
|
||||
|
||||
* 2020.09.07
|
||||
* Add `HRNet_W18_C_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 81.16%.
|
||||
* Add `MobileNetV3_small_x0_35_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 55.55%.
|
||||
|
||||
* 2020.07.14
|
||||
* Add `Res2Net200_vd_26w_4s_ssld` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 85.13%.
|
||||
* Add `Fix_ResNet50_vd_ssld_v2` pretrained model, whose Top-1 Acc on ImageNet1k dataset reaches 84.00%.
|
||||
|
||||
* 2020.06.17
|
||||
* Add English documents.
|
||||
|
||||
* 2020.06.12
|
||||
* Add support for training and evaluation on Windows or CPU.
|
||||
|
||||
* 2020.05.17
|
||||
* Add support for mixed precision training.
|
||||
|
||||
* 2020.05.09
|
||||
* Add user guide about Paddle Serving and Paddle-Lite.
|
||||
* Add benchmark about FP16/FP32 on T4 GPU.
|
||||
|
||||
* 2020.04.14
|
||||
* First commit.
|
@ -0,0 +1,60 @@
|
||||
# Version Updates
|
||||
|
||||
------
|
||||
|
||||
## Catalogue
|
||||
|
||||
- [1. v2.3](#1)
|
||||
- [2. v2.2](#2)
|
||||
|
||||
<a name='1'></a>
|
||||
|
||||
## 1. v2.3
|
||||
|
||||
- Model Update
|
||||
- Add pre-training weights for lightweight models, including detection models and feature models
|
||||
- Release PP-LCNet series of models, which are self-developed ones designed to run on CPU
|
||||
- Enable SwinTransformer, Twins, and Deit to support direct training from scrach to achieve thesis accuracy.
|
||||
- Basic framework capabilities
|
||||
- Add DeepHash module, which supports feature model to directly export binary features
|
||||
- Add PKSampler, which tackles the problem that feature models cannot be trained by multiple machines and cards
|
||||
- Support PaddleSlim: support quantization, pruning training, and offline quantization of classification models and feature models
|
||||
- Enable legendary models to support intermediate model output
|
||||
- Support multi-label classification training
|
||||
- Inference Deployment
|
||||
- Replace the original feature retrieval library with Faiss to improve platform adaptability
|
||||
- Support PaddleServing: support the deployment of classification models and image recognition process
|
||||
- Versions of the Recommendation Library
|
||||
- python: 3.7
|
||||
- PaddlePaddle: 2.1.3
|
||||
- PaddleSlim: 2.2.0
|
||||
- PaddleServing: 0.6.1
|
||||
|
||||
<a name='2'></a>
|
||||
|
||||
## 2. v2.2
|
||||
|
||||
- Model Updates
|
||||
- Add models including LeViT, Twins, TNT, DLA, HardNet, RedNet, and SwinTransfomer
|
||||
- Basic framework capabilities
|
||||
- Divide the classification models into two categories
|
||||
- legendary models: introduce TheseusLayer base class, add the interface to modify the network function, and support the networking data truncation and output
|
||||
- model zoo: other common classification models
|
||||
- Add the support of Metric Learning algorithm
|
||||
- Add a variety of related loss algorithms, and the basic network module gears (allow the combination with backbone and loss) for convenient use
|
||||
- Support both the general classification and metric learning-related training
|
||||
- Support static graph training
|
||||
- Classification training with dali acceleration supported
|
||||
- Support fp16 training
|
||||
- Application Updates
|
||||
- Add specific application cases and related models of product recognition, vehicle recognition (vehicle fine-grained classification, vehicle ReID), logo recognition, animation character recognition
|
||||
- Add a complete pipeline for image recognition, including detection module, feature extraction module, and vector search module
|
||||
- Inference Deployment
|
||||
- Add Mobius, Baidu's self-developed vector search module, to support the inference deployment of the image recognition system
|
||||
- Image recognition, build feature library that allows batch_size>1
|
||||
- Documents Update
|
||||
- Add image recognition related documents
|
||||
- Fix bugs in previous documents
|
||||
- Versions of the Recommendation Library
|
||||
- python: 3.7
|
||||
- PaddlePaddle: 2.1.2
|
@ -0,0 +1,10 @@
|
||||
quick_start
|
||||
================================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
quick_start_classification_new_user_en.md
|
||||
quick_start_classification_professional_en.md
|
||||
quick_start_recognition_en.md
|
||||
quick_start_multilabel_classification_en.md
|
@ -0,0 +1,306 @@
|
||||
# Trial in 30mins(professional)
|
||||
|
||||
Here is a quick start tutorial for professional users to use PaddleClas on the Linux operating system. The main content is based on the CIFAR-100 data set. You can quickly experience the training of different models, experience loading different pre-trained models, experience the SSLD knowledge distillation solution, and experience data augmentation. Please refer to [Installation Guide](../installation/install_paddleclas_en.md) to configure the operating environment and clone PaddleClas code.
|
||||
|
||||
------
|
||||
|
||||
## Catalogue
|
||||
|
||||
- [1. Data and model preparation](#1)
|
||||
- [1.1 Data preparation](#1.1)
|
||||
- [1.1.1 Prepare CIFAR100](#1.1.1)
|
||||
- [2. Model training](#2)
|
||||
- [2.1 Single label training](#2.1)
|
||||
- [2.1.1 Training without loading the pre-trained model](#2.1.1)
|
||||
- [2.1.2 Transfer learning](#2.1.2)
|
||||
- [3. Data Augmentation](#3)
|
||||
- [3.1 Data augmentation-Mixup](#3.1)
|
||||
- [4. Knowledge distillation](#4)
|
||||
- [5. Model evaluation and inference](#5)
|
||||
- [5.1 Single-label classification model evaluation and inference](#5.1)
|
||||
- [5.1.1 Single-label classification model evaluation](#5.1.1)
|
||||
- [5.1.2 Single-label classification model prediction](#5.1.2)
|
||||
- [5.1.3 Single-label classification uses inference model for model inference](#5.1.3)
|
||||
|
||||
<a name="1"></a>
|
||||
|
||||
## 1. Data and model preparation
|
||||
|
||||
<a name="1.1"></a>
|
||||
|
||||
### 1.1 Data preparation
|
||||
|
||||
|
||||
* Enter the PaddleClas directory.
|
||||
|
||||
```
|
||||
cd path_to_PaddleClas
|
||||
```
|
||||
|
||||
<a name="1.1.1"></a>
|
||||
|
||||
#### 1.1.1 Prepare CIFAR100
|
||||
|
||||
* Enter the `dataset/` directory, download and unzip the CIFAR100 dataset.
|
||||
|
||||
```shell
|
||||
cd dataset
|
||||
wget https://paddle-imagenet-models-name.bj.bcebos.com/data/CIFAR100.tar
|
||||
tar -xf CIFAR100.tar
|
||||
cd ../
|
||||
```
|
||||
|
||||
<a name="2"></a>
|
||||
|
||||
## 2. Model training
|
||||
|
||||
<a name="2.1"></a>
|
||||
|
||||
### 2.1 Single label training
|
||||
|
||||
<a name="2.1.1"></a>
|
||||
|
||||
#### 2.1.1 Training without loading the pre-trained model
|
||||
|
||||
* Based on the ResNet50_vd model, the training script is shown below.
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python3 -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
|
||||
-o Global.output_dir="output_CIFAR"
|
||||
```
|
||||
|
||||
The highest accuracy of the validation set is around 0.415.
|
||||
|
||||
<a name="2.1.2"></a>
|
||||
|
||||
|
||||
#### 2.1.2 Transfer learning
|
||||
|
||||
* Based on ImageNet1k classification pre-training model ResNet50_vd_pretrained (accuracy rate 79.12%) for fine-tuning, the training script is shown below.
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python3 -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
|
||||
-o Global.output_dir="output_CIFAR" \
|
||||
-o Arch.pretrained=True
|
||||
```
|
||||
|
||||
The highest accuracy of the validation set is about 0.718. After loading the pre-trained model, the accuracy of the CIFAR100 data set has been greatly improved, with an absolute accuracy increase of 30%.
|
||||
|
||||
* Based on ImageNet1k classification pre-training model ResNet50_vd_ssld_pretrained (accuracy rate of 82.39%) for fine-tuning, the training script is shown below.
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python3 -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
|
||||
-o Global.output_dir="output_CIFAR" \
|
||||
-o Arch.pretrained=True \
|
||||
-o Arch.use_ssld=True
|
||||
```
|
||||
|
||||
In the final CIFAR100 verification set, the top-1 accuracy is 0.73. Compared with the fine-tuning of the pre-trained model with a top-1 accuracy of 79.12%, the top-1 accuracy of the new data set can be increased by 1.2% again.
|
||||
|
||||
* Replace the backbone with MobileNetV3_large_x1_0 for fine-tuning, the training script is shown below.
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python3 -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ./ppcls/configs/quick_start/professional/MobileNetV3_large_x1_0_CIFAR100_finetune.yaml \
|
||||
-o Global.output_dir="output_CIFAR" \
|
||||
-o Arch.pretrained=True
|
||||
```
|
||||
|
||||
The highest accuracy of the validation set is about 0.601, which is nearly 12% lower than ResNet50_vd.
|
||||
|
||||
<a name="3"></a>
|
||||
|
||||
|
||||
## 3. Data Augmentation
|
||||
|
||||
PaddleClas contains many data augmentation methods, such as Mixup, Cutout, RandomErasing, etc. For specific methods, please refer to [Data augmentation chapter](../algorithm_introduction/DataAugmentation_en.md)。
|
||||
|
||||
<a name="3.1"></a>
|
||||
|
||||
### 3.1 Data augmentation-Mixup
|
||||
|
||||
Based on the training method in [Data Augmentation Chapter](../algorithm_introduction/DataAugmentation_en.md) in Section 3.3, combined with Mixup's data augmentation method for training, the specific training script is shown below.
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python3 -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_mixup_CIFAR100_finetune.yaml \
|
||||
-o Global.output_dir="output_CIFAR"
|
||||
|
||||
```
|
||||
|
||||
|
||||
The final accuracy on the CIFAR100 verification set is 0.73, and the use of data augmentation can increase the model accuracy by about 1.2% again.
|
||||
|
||||
|
||||
* **Note**
|
||||
|
||||
* For other data augmentation configuration files, please refer to the configuration files in `ppcls/configs/ImageNet/DataAugment/`.
|
||||
* The number of epochs for training CIFAR100 is small, so the accuracy of the validation set may fluctuate by about 1%.
|
||||
|
||||
<a name="4"></a>
|
||||
|
||||
|
||||
## 4. Knowledge distillation
|
||||
|
||||
|
||||
PaddleClas includes a self-developed SSLD knowledge distillation scheme. For specific content, please refer to [Knowledge Distillation Chapter](../algorithm_introduction/knowledge_distillation_en.md). This section will try to use knowledge distillation technology to train the MobileNetV3_large_x1_0 model. Here we use the ResNet50_vd model trained in section 2.1.2 as the teacher model for distillation. First, save the ResNet50_vd model trained in section 2.1.2 to the specified directory. The script is as follows.
|
||||
|
||||
```shell
|
||||
mkdir pretrained
|
||||
cp -r output_CIFAR/ResNet50_vd/best_model.pdparams ./pretrained/
|
||||
```
|
||||
|
||||
The model name, teacher model and student model configuration, pre-training address configuration, and freeze_params configuration in the configuration file are as follows, where the two values in `freeze_params_list` represent whether the teacher model and the student model freeze parameter training respectively.
|
||||
|
||||
```yaml
|
||||
Arch:
|
||||
name: "DistillationModel"
|
||||
# if not null, its lengths should be same as models
|
||||
pretrained_list:
|
||||
# if not null, its lengths should be same as models
|
||||
freeze_params_list:
|
||||
- True
|
||||
- False
|
||||
models:
|
||||
- Teacher:
|
||||
name: ResNet50_vd
|
||||
pretrained: "./pretrained/best_model"
|
||||
- Student:
|
||||
name: MobileNetV3_large_x1_0
|
||||
pretrained: True
|
||||
```
|
||||
|
||||
The loss configuration is as follows, where the training loss is the cross entropy of the output of the student model and the teacher model, and the validation loss is the cross entropy of the output of the student model and the true label.
|
||||
|
||||
```yaml
|
||||
Loss:
|
||||
Train:
|
||||
- DistillationCELoss:
|
||||
weight: 1.0
|
||||
model_name_pairs:
|
||||
- ["Student", "Teacher"]
|
||||
Eval:
|
||||
- DistillationGTCELoss:
|
||||
weight: 1.0
|
||||
model_names: ["Student"]
|
||||
```
|
||||
|
||||
The final training script is shown below.
|
||||
|
||||
```shell
|
||||
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
||||
python3 -m paddle.distributed.launch \
|
||||
--gpus="0,1,2,3" \
|
||||
tools/train.py \
|
||||
-c ./ppcls/configs/quick_start/professional/R50_vd_distill_MV3_large_x1_0_CIFAR100.yaml \
|
||||
-o Global.output_dir="output_CIFAR"
|
||||
|
||||
```
|
||||
|
||||
|
||||
In the end, the accuracy on the CIFAR100 validation set was 64.4%. Using the teacher model for knowledge distillation, the accuracy of MobileNetV3 increased by 4.3%.
|
||||
|
||||
* **Note**
|
||||
|
||||
* In the distillation process, the pre-trained model used by the teacher model is the training result on the CIFAR100 dataset, and the student model uses the MobileNetV3_large_x1_0 pre-trained model with an accuracy of 75.32% on the ImageNet1k dataset.
|
||||
* The distillation process does not need to use real labels, so more unlabeled data can be used. In the process of use, you can generate fake `train_list.txt` from unlabeled data, and then merge it with the real `train_list.txt`, You can experience it yourself based on your own data.
|
||||
|
||||
<a name="5"></a>
|
||||
|
||||
## 5. Model evaluation and inference
|
||||
|
||||
<a name="5.1"></a>
|
||||
|
||||
### 5.1 Single-label classification model evaluation and inference
|
||||
|
||||
<a name="5.1.1"></a>
|
||||
|
||||
#### 5.1.1 Single-label classification model evaluation
|
||||
|
||||
After training the model, you can use the following commands to evaluate the accuracy of the model.
|
||||
|
||||
```bash
|
||||
python3 tools/eval.py \
|
||||
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
|
||||
-o Global.pretrained_model="output_CIFAR/ResNet50_vd/best_model"
|
||||
```
|
||||
|
||||
<a name="5.1.2"></a>
|
||||
|
||||
#### 5.1.2 Single-label classification model prediction
|
||||
|
||||
After the model training is completed, the pre-trained model obtained by the training can be loaded for model prediction. A complete example is provided in `tools/infer.py`, the model prediction can be completed by executing the following command:
|
||||
|
||||
```python
|
||||
python3 tools/infer.py \
|
||||
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
|
||||
-o Infer.infer_imgs=./dataset/CIFAR100/test/0/0001.png \
|
||||
-o Global.pretrained_model=output_CIFAR/ResNet50_vd/best_model
|
||||
```
|
||||
|
||||
<a name="5.1.3"></a>
|
||||
|
||||
#### 5.1.3 Single-label classification uses inference model for model inference
|
||||
|
||||
We need to export the inference model, PaddlePaddle supports the use of prediction engines for inference. Here, we will introduce how to use the prediction engine for inference:
|
||||
First, export the trained model to inference model:
|
||||
|
||||
```bash
|
||||
python3 tools/export_model.py \
|
||||
-c ./ppcls/configs/quick_start/professional/ResNet50_vd_CIFAR100.yaml \
|
||||
-o Global.pretrained_model=output_CIFAR/ResNet50_vd/best_model
|
||||
```
|
||||
|
||||
* By default, `inference.pdiparams`, `inference.pdmodel` and `inference.pdiparams.info` files will be generated in the `inference` folder.
|
||||
|
||||
Use prediction engines for inference:
|
||||
|
||||
Enter the deploy directory:
|
||||
|
||||
```bash
|
||||
cd deploy
|
||||
```
|
||||
|
||||
Change the `inference_cls.yaml` file. Since the resolution used for training CIFAR100 is 32x32, the relevant resolution needs to be changed. The image preprocessing in the final configuration file is as follows:
|
||||
|
||||
```yaml
|
||||
PreProcess:
|
||||
transform_ops:
|
||||
- ResizeImage:
|
||||
resize_short: 36
|
||||
- CropImage:
|
||||
size: 32
|
||||
- NormalizeImage:
|
||||
scale: 0.00392157
|
||||
mean: [0.485, 0.456, 0.406]
|
||||
std: [0.229, 0.224, 0.225]
|
||||
order: ''
|
||||
- ToCHWImage:
|
||||
```
|
||||
|
||||
Execute the command to make predictions. Since the default `class_id_map_file` is the mapping file of the ImageNet dataset, you need to set None here.
|
||||
|
||||
```bash
|
||||
python3 python/predict_cls.py \
|
||||
-c configs/inference_cls.yaml \
|
||||
-o Global.infer_imgs=../dataset/CIFAR100/test/0/0001.png \
|
||||
-o PostProcess.Topk.class_id_map_file=None
|
||||
```
|
After Width: | Height: | Size: 209 KiB |
After Width: | Height: | Size: 212 KiB |
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in new issue