Let’s Play Lego in a different way!
Authors: Jing Cao, Jieping Yang
LEGO usually sells building bricks in sets to build a specific object like a Lamborghini or the White House. Though those parts are in different shapes, sizes, and colors, there are a lot of common parts that can be used in different sets. If you have some scattered parts, in which sets can you use them? Even further, if you try to build something new with your lego bricks on hand, what could you build? There is a website called Rebrickable(https://rebrickable.com/) that provides a search function with a similar purpose as ours, and one of our datasets is actually collected from Rebrickable API. However, they don’t provide an image search. Our project is to build an image search engine especially for the users who have no idea what the name of each part is. The user can get information about the name of this part, as well as which lego sets the part could be used for, after uploading the part image.
Our search engine is based on image classification. Convolutional Neural Networks (CNNs) is perfect for large-scale image recognition or classification. Now let’s build it for our lego brick search engine!
We have two parts of data. The first part is the lego sets data that contains detailed information of lego pieces in sets. The second part is the lego image data that includes images of 50 kinds of lego parts.
Lego sets data: https://www.kaggle.com/rtatman/lego-database
Because one of the aims of this project is to check whether a part is in a set or not, then we need to collect them into one database which includes part_num, part_name, set_num, set_name. In that way, if we know a part name or number, we can retrieve it to a set, a specific object. The final dataset only contains 1568 objects.
The image dataset contains 50 classes of lego parts. After combining it with the lego sets data, we finally have 32000 images that includes 38 kinds of parts with 800 images for each. This dataset is very limited on the shapes and colors of lego bricks (only grey!) but it is still about 1 GB. We split the data into training, validation, and test with a rough ratio of 7:1.5:1.5. Therefore, for each class, we have 560 images to train the models.
Cleaning the database
Since the data we need is in different tables, we need to merge table “parts” and “sets” through “inventory_parts” and “inventories” based on the corresponding ids. After 3 merges, we got the final database with “part_name”, “part_num”, “set_name”, and “set_number” variables to help us do information classification.
We tried two models for classification. One is the popular Resnet50.
The other one is our customized model. The architecture of the customized model is shown in Fig. 2. In order to reduce computational cost and prevent overfitting, we also added a Dropout layer with the probability of 0.5. There are 42506726 parameters to train.
We run those two models on two sets of data. One is the original image data that is rescaled by 1/225 so that every pixel is between 0 and 1. The other one is the image data with augmentation, more specifically, data warping in this project. The reason for data warping is to increase the amount of data by adding modified pictures. This method is proved to be an effective way for regularization that prevents overfitting. Noted that we only augmented the training data. A sample of data warping is shown in Fig. 3.
Let’s train the model!
Predicting new lego pieces
Once the model is workable, we can get a number of the parts from an image uploaded by the user. The output data is the name of the sets that can be builded by this specific part. In this way, we can achieve the original goal that is to check if the users try to build something new with their lego bricks on hand, what they could build.
If you are interested in our search engine, please feel free to check out our code at [https://github.com/jingcao33/lego-bricks-image-classification].
Results and Discussion
Resnet50 achieved a very low accuracy of 0.1336, which is a surprise for us. We also tried it on augmented data in order to prevent the overfitting problem and it turned out to be even lower accuracy of 0.0825. Our customized model has a much better result on the original data with an accuracy of 0.6147 on the original data and an accuracy of 0.3972 on the augmented data within 30 epochs.
Looking at the training and validation accuracy through time in Fig. 4 and 5, the interesting thing is that ResNet50 doesn’t show severe overfitting problems on the original data, but it suffered from overfitting when it was trained on the augmented data. The customized model, on the other hand, follows the instinct. It shows overfitting problems on the original data but not on the augmented data (underfitting under 30 epochs). Within 30 epochs, the customized model fitted on the original data achieved better accuracy than the one fitted on the augmented model. But the latter model could achieve better results if we have time to run more epochs without overfitting.
In general, all of the models performed not well on lego image classification, especially with the data with augmentation. Our customized model, as a much simpler model, works better than Resnet50 in this project. We think the main reason is the lack of original data. The original training data in one class includes 560 pictures of one lego piece with different angles. Even though more original data with more details could help this complex convolutional neural network to improve the result in general, twisting the images to generate more data may not be useful in this case because some of the twisted images lose information due to the brightness changes (imitating real pictures).
Besides, Resnet50 is pre-trained on ImageNet dataset (Russakovsky et al., 2015) that includes animals (Lion, cat, bird, etc), foods (strawberry, orange, etc), objects (minivan, racket, mug, etc.). The dataset (last checked at 19:00 pm on Dec. 16th, 2020) has very limited data related to building materials (96) and even fewer pictures of similar building pieces (e.g., bricks) to lego pieces. Therefore, it’s reasonable to have a very low accuracy based on the parameters trained on non-relevant data to lego pieces. In the future, if we still benefit from those kinds of pre-trained models, we can unfreeze the last few layers in the base model and train both these layers and the customized part to fine-tune the neural network based on our own data.
Overall, this project is a pilot trial for lego part classification with limited data and low computational power. There are a lot of places that can be improved in future work.
- Prepare more real pictures, a lot more!
- Increase the size of our customized model by adding more layers or neurons. Or tune the parameters such as the size of the pooling layer, the number of epochs, the learning rate, etc.
- Try other pre-trained models for the fine-grained image classification, such as TBMSL-Net (Zhang et al., 2020) and DAT (Ngiam, 2018)
- Create a web page that allows users to upload pictures!
Ngiam, J., Peng, D., Vasudevan, V., Kornblith, S., Le, Q. V., & Pang, R. (2018). Domain adaptive transfer learning with specialist models. arXiv preprint arXiv:1811.07056.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … & Berg, A. C. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3), 211–252.
Zhang, F., Li, M., Zhai, G.,, & Liu, Y. (2020). Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization. arXiv:2003.09150v3