Search anything:

30+ Computer Vision Projects

Internship at OpenGenus

Get this book -> Problems on Array: For Interviews and Competitive Programming

In this article, we will explore over 30 Computer Vision (CV) projects that will help boost your portfolio. We will discuss in brief each project along with the models used, datasets used, project domain, codebase and research paper.


1. Object Detection

Object Detection involves detecting instances of objects in images or videos. It involves identifying and localizing objects within an image and classifying them into pre-defined categories.
Models: MobileNet SSD or YOLO

2. Image Segmentation

Image Segmentation involves dividing an image into multiple segments or regions, each of which corresponds to a different object or part of the image. It involves assigning a label or class to each pixel in the image, based on the pixel's characteristics and its relationship to other pixels in the image.

3. Single Shot MultiBox Detector (SSD)

This is a single-shot object detection system that uses a deep convolutional neural network to predict object class scores and bounding box offsets. The system is fast and efficient, making it well-suited for real-time object detection tasks.

4. Pose Estimation

This involves estimating the pose (position and orientation) of an object in a given image. It is typically used to determine the orientation of human bodies, faces or objects, and is widely used in applications such as augmented reality, gaming, and human-computer interaction.


5. Face Detection

This involves identifying and locating human faces in images or videos. It is a crucial component in many facial recognition systems, and often the first step in processing facial images.

6. Lane Detection

This involves identify the lanes and markings on roads in images and videos, typically used in self-driving car applications to assist the navigation. It involves detecting the boundaries of lanes and classifying them as either driving or non-driving lanes to provide real-time lane guidance.


7. Optical Flow

This involves calculating the motion of objects and pixels between consecutive frames in a video. It is used in various applications such as video compression, action recognition, and autonomous driving.

8. DeepLab

This is a series of deep convolutional neural networks for semantic image segmentation. The systems use atrous convolutions and conditional random fields to generate high-quality segmentation masks.

9. Object Tracking

This involves locating and tracking objects over time in a video stream. It is typically accomplished by using computer algorithms to analyze sequential frames of the video and determine the object's position and trajectory in each frame.
Models: SORT (Simple Online and Realtime Tracking).

10. Deeplab v3+

This is an improved version of the DeepLab semantic image segmentation system. The system uses atrous convolutions and deep supervision to achieve high performance on a variety of benchmark datasets.

11. Image Classification

This involves assigning a label to an input image, based on its visual content. It involves training a machine learning model on a large dataset of labeled images and using it to predict the class of new, unseen images.

12. Car License Plate Detection

This involves locating and extracting the license plate information from a given car image. The extracted information can then be used for various purposes such as vehicle identification and tracking.

13. Viola-Jones Algorithm

This is a classic computer vision method for detecting faces in images and video. The system uses a cascade of simple Haar-like features to efficiently detect faces in real-time.

14. Multi-task Cascaded Convolutional Networks (MTCNN)

This is a face detection system that uses cascaded convolutional networks to perform face detection, facial landmark localization, and facial attribute analysis.

15. Super-Resolution

This increases the resolution of an image while preserving its important details. It can be achieved using deep learning models that learn the mapping between low and high resolution images, generating a high-resolution image from a low-resolution one as output.

  • Models: SRResNet, EDSR, RCAN
  • Datasets: DIV2K, Set5, Set14, B100, Urban100
  • Application domain: Image Processing
  • Level: Intermediate
  • Audience Interest level: High
  • Explanation: Super-Resolution using Deep Convolutional Neural Networks

Source code:

Research paper:


16. Image Restoration

This involves removing degradation such as noise, blur, or over-exposure from an image, to enhance its quality and make it more visually appealing. This is achieved by using techniques such as denoising, deblurring, and inpainting to undo the effects of degradation and recover the original image content.

  • Models: Deep Convolutional Neural Network (DCNN)
  • Datasets: BSD500 dataset, DIV2K dataset
  • Application domain: Image Processing
  • Level: Intermediate
  • Audience Interest level: High
  • Explanation: Image restoration involves removing degradation and improving the visual quality of an image, such as removing noise, blurring, or removing defects.
  • Source code: https://github.com/SaoYan/Image-Restoration
  • Research Paper: Image Restoration Using Convolutional Neural Networks (2017), https://arxiv.org/abs/1707.06841


17. Scene Understanding

This involves recognizing and categorizing different elements in a scene (such as objects, people, and environment) to gain a deeper understanding of the context and meaning of an image. It often involves multiple computer vision tasks, including object detection, image segmentation, and semantic segmentation.

  • Models: PointNet, SceneNet RGB-D, SPADE
  • Datasets: NYUv2, ScanNet, SUN RGB-D
  • Application domain: 3D Scene Analysis
  • Level: Intermediate
  • Audience Interest level: Moderate
  • Explanation: Scene Understanding using Deep Learning techniques such as PointNet, SceneNet RGB-D, and SPADE
  • Source code: https://github.com/charlesq34/pointnet

Research paper:

18. Action Recognition

This involves identifying and classifying human actions within video data. It is commonly used in surveillance, sports analysis and human-computer interaction applications.

  • Models: Two-Stream Convolutional Networks, C3D, LSTM, ResNet-50
  • Datasets: UCF101, HMDB51, Kinetics
  • Application domain: Video Processing
  • Level: Intermediate
  • Audience Interest level: High
  • Explanation: Action Recognition using Deep Learning approaches to classify human actions in videos.
  • Codebase: https://github.com/axelbarroso/C3D-keras
  • Research Paper:
    • "Two-Stream Convolutional Networks for Action Recognition in Videos" by Simonyan and Zisserman (2014).
    • "Learning Spatiotemporal Features with 3D Convolutional Networks" by Tran et al. (2015).

19. Image Style Transfer

This involves transforming an input image to have the same style as a reference image, while retaining its content. It is often achieved using Convolutional Neural Networks (CNNs) trained on large datasets of style images.

  • Models: Neural Style Transfer (NST)
  • Datasets: MS-COCO, ImageNet
  • Application Domain: Deep Learning
  • Level: Intermediate
  • Audience Interest Level: High
  • Explanation: Image style transfer is a task in computer vision where the style of one image is transferred to the content of another image.
  • Source code: https://github.com/cysmith/neural-style-tf
  • Research Paper: A Neural Algorithm of Artistic Style, Leon A. Gatys et al. (2015) https://arxiv.org/abs/1508.06576

20. Gaze Estimation

This involves predicting the direction a person is looking based on their eye and head movements. It is used to understand user attention and to provide interactive control in human-computer interaction systems.

  • Models: GazeNet, DeepGaze II, MPIIGaze
  • Datasets: MPIIFaceGaze, ETRA
  • Application Domain: Computer Vision, Human-Computer Interaction
  • Level: Beginner to Intermediate
  • Audience Interest Level: Moderate
  • Explanation: Gaze Estimation is the process of predicting the direction of a person's gaze based on image or video data. This can be useful for various applications, such as assistive technologies for people with disabilities, human-computer interaction, and user attention analysis.


Research Papers:

21. Image Generation

In this, we generate new images from a given input, typically a noise vector or a sample from a prior distribution. It aims to capture the underlying structure and patterns of an image dataset to generate new, diverse images that are similar in style and content to the training data.

Note: These are just some popular examples of Image Generation projects and many more models exist in this domain, some of which are variations of the above mentioned models.

22. Image Captioning

This is a task in computer vision where a textual description is generated for an input image, aimed at explaining its content to a human reader. It uses deep learning models to learn the mapping between image and textual representations.

  • Models: Encoder-Decoder Models, Attention-based Models
  • Datasets: Microsoft COCO, Flickr30k
  • Application Domain: Natural Language Processing
  • Level: Intermediate
  • Audience Interest Level: High
  • Explanation: Image Captioning involves generating textual description of an image. It is a task in computer vision and natural language processing, and is typically performed using Encoder-Decoder models or Attention-based models.


23. Disease detection CV projects

Here are some disease detection projects in Computer Vision and their codebase/research paper links:

With this article at OpenGenus, you must have a good background of what are the different Computer Vision projects you can do.

30+ Computer Vision Projects
Share this