Vision Transformer Model using just one quarter of the target data outperforming CNN models.
Technologies:
- C/C++ / Python
- Linux Ubuntu
- Zero Shot Transformer Model
- Clip
- Open Vocabulary Detection and Classification without prior training
- Aras of Interest and Objects Detection
- TensorRT