Image recognition 2400-ZEWW989
The course will sequentially cover the following topics:
1. Image preprocessing:
grayscale conversion, normalization, contrast enhancement (e.g., histogram equalization),
noise reduction: Gaussian filtering, median filtering,
resizing, sharpening, blurring,
data augmentation: rotation, flipping, cropping, color jittering.
2. Image classification:
feature extraction: HOG, LBP, SIFT,
classical classification models,
introduction to deep learning,
convolutional neural networks: structure, convolutional layers, pooling,
CNN architectures: LeNet, AlexNet, VGG, ResNet, EfficientNet,
transfer learning and fine-tuning with pre-trained models.
3. Object detection:
sliding window, selective search,
classical object detection methods, HOG + SVM,
modern methods: YOLO family (v3, v5, v8), SSD, Faster R-CNN,
anchor boxes, Intersection over Union, Non-Maximum Suppression.
4. Face recognition and person identification:
face and person verification vs. identification,
face detection: Viola-Jones, MTCNN, Haar cascades, DNN-based detectors,
face embeddings: FaceNet, DeepFace, Dlib.
5. Motion analysis and tracking:
frame differencing, background subtraction,
classical tracking algorithms: Kalman filter, MeanShift, CamShift,
multi-object tracking: SORT, Deep SORT,
optical flow: Lucas-Kanade, Farneback.
Type of course
Course coordinators
Learning outcomes
Students will learn how to prepare images for further analysis, particularly with a view to classification, object detection, face recognition, and person identification. They will gain an understanding of the theoretical foundations behind the algorithms used for these tasks, as well as practical experience in implementing relevant code. By the end of the course, students will be able to analyse images and select appropriate methods according to the nature of the problem at hand. In addition, they will acquire knowledge on how to evaluate the performance of models tailored to specific tasks. They will also develop an awareness of current challenges and issues associated with image recognition.
Assessment criteria
The final grade will be determined based on: a home-taken project (70% of the grade) and a project presentation (30% of the grade).
The assessment will be both written (project) and oral (project presentation).
Bibliography
Basic:
Goodfellow, I., Bengio, Y., & Courville, A. (2016) Deep learning. MIT Press.
Gonzalez, R., & Woods, R. (2017). Digital Image Processing (4th ed.). Pearson.
Shanmugamani, R. (2018). Deep Learning for Computer Vision: Expert techniques to train advanced neural networks using TensorFlow and Keras. Packt Publishing Ltd.
Szeliski, R. (2022). Computer vision: algorithms and applications. Springer Nature.
Supplementary:
Friedman, J., Hastie, T., & Tibshirani, R. (2009). The elements of statistical learning: data mining, inference and prediction. Springer Series in Statistics.
Harrington, P. (2012). Machine learning in action, vol. 5, Greenwich, CT: Manning.
Kuhn, M., & Johnson, K. (2013). Applied predictive modelling, Springer-Verlag.
Planche, B., & Andres, E. (2019). Hands-On Computer Vision with TensorFlow 2: Leverage deep learning to create powerful image processing apps with TensorFlow 2.0 and Keras. Packt Publishing Ltd.
Additional information
Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system: