This project focuses on building and analyzing image classification models for Korean food datasets, primarily using ResNet architectures. The goal was to classify various Korean food images and then adapt the best-performing model for a health-focused Korean food classification task through transfer learning.
Project duration : Sep 2023 - Dec 2023
Dataset Sources
Note: The final competition submission files, including specific code for the best-performing models, are not uploaded in this repository.
kfood_train: 33,593 images, 42 classes of general Korean food (갈비구이, 갈치구이, 훈제오리 etc)kfood_val: 4,198 images, evaluation set forkfood_train.kfood_health_train: 14,115 images, 13 classes of health-focused Korean food (갈비찜, 된장찌개, 부대찌개 etc). Used for the transfer learning task.kfood_health_val: 1,764 images, evaluation set forkfood_health_train.
Data Preparation : An 8:2 cross-validation split was used for dataset preparation.
Build a baseline image classification model using ResNet18 on the kfood datasets and analyze its performance, particularly focusing on overfitting.
Model & Training Details
Following instructions
- Architecture: ResNet18 (without pre-trained weights)
- Epochs: 50
- Loss Function: Cross Entropy.
- Optimizer: SGD (Stochastic Gradient Descent).
- Learning Rate: 0.001
- Batch Size: 32
- Image Preprocessing: Resize to 224x224 pixels.
Results
- Validation Accuracy: 0.91
- Test Accuracy: 0.6065
- Observation: Significant overfitting was observed, indicating the model performed very well on the training data but generalized poorly to unseen test data.
[Study] ResNet Concepts
Deep Residual Learning for Image Recognition Paper
ResNets address the degradation problem in deep neural networks (where increasing depth leads to higher training error, not just overfitting) by introducing residual learning. Instead of directly learning a mapping$H(x)$ , ResNets learn a residual mapping$F(x) = H(x) - x$ . This is facilitated by identity shortcut connections which allow the output of a block to be the sum of the original input and the output of the stacked layers: Output$= F(x) + x$ . This structure makes training very deep networks much easier.
Enhance the classification accuracy on the kfood datasets by experimenting with deeper ResNet architectures, advanced optimizers, and image augmentation techniques.
Our team's approach: Extensive tuning was performed, including trying ResNet34, ResNet50, ResNet101, and ResNet154 architectures, various augmentation combinations, also learning rates & epoch tunings.
Model & Training Details
- Architecture: ResNet50
- Epochs: 50
- Loss Function: Cross Entropy
- Optimizer: Adam (Adaptive Moment Estimation)
- Learning Rate: 0.005
- Batch Size: 32
- Image Preprocessing:** Resize to 224x224 pixels
- Improvements:
(1) Normalization: RGB channel-wise normalization applied to images.
(2) Image Augmentation (for training data):RandomRotation,CenterCrop.
Results
- Validation Accuracy: 0.71
- Test Accuracy: 0.745
- Architecture: ResNet101
- Epochs: 70
- Optimizer: Adam
- Learning Rate: 0.001
- Batch Size: 64 (to reduce time)
- Image Preprocessing:** Resize to 224x224 pixels
- Further a
Results
- Test Accuracy: 0.7918
Utilize the best-performing model from Mission 2 as a pre-trained model for classifying a new dataset of health-focused Korean food images into 13 classes.
Our team's approach: Transfer learning was applied by loading the checkpoint from Mission 2's best model (mission2.pt) and adapting it for the new task.
Model & Training Details
- Base Architecture: ResNet101 (pre-trained with
mission2.pt) - Output Layer Modification: The final Fully Connected (FC) layer was modified to output 13 classes, matching the
kfood_healthdataset. - Batch Size: 64
- Epochs: 50
- Optimizer: Adam
- Learning Rate: 0.001
Fine-tuning Strategies Explored:
- Train the entire model: All layers' weights were unfrozen and re-trained. (Slightly better performance observed with this method).
- Partial Freezing: Initial layers were frozen, and only later layers were re-trained.
- Linear Probing : All layers of pretrained model are frozen and only train last single FC layer.
Results
- Test Accuracy: 0.95(Initial) -> 0.98(Final)
**[Study] Transfer learning & fine tuning **
PyTorch Tutorials: Transfer Learning.
Transfer Learning: A machine learning technique where a model pre-trained on a large dataset for a general task is repurposed as the starting point for a new, related task. This avoids training a model from scratch, leveraging learned features.
Fine-tuning: A specific type of transfer learning where, after replacing the final layers to match the new task's number of classes, the entire model or a subset of its layers are further trained on the new dataset. Initial layers can optionally be "frozen" (their weights not updated) to preserve general feature extractors.