Instagram as an online photo-sharing and socialnetworking service is becoming more powerful in enabling fashion brands to ramp up their business growth. Nowadays, a single post by a fashion influencer attracts a wealth of attention and a magnitude of followers who are curious to know more about the brands and style of each clothing item sitting inside the image. To this end, the development of efficient Deep CNN models that can accurately detect styles and brands have become a research challenge. In addition, current techniques need to cope with inherent fashion-related data issues. Namely, clothing details inside a single image only cover a small proportion of the large and hierarchical space of possible brands and clothing item attributes. In order to cope with these challenges, one can argue that neural classifiers should become adapted to large-scale and hierarchical fashion datasets.
As a remedy, we propose two novel techniques to incorporate the valuable social media textual content to support the visual classification in a dynamic way. The first method is adaptive neural pruning (DynamicPruning) in which the clothing item category detected from posts’ text analysis can be used to activate the possible range of connections of clothing attributes’ classifier. The second method (DynamicLayers) is a dynamic framework in which multiple-attributes classification layers exist and a suitable attributes’ classifier layer is activated dynamically based upon the mined text from the image. Aleesha Institute Fashion Designing Extensive experiments on a dataset gathered from Instagram and a baseline fashion dataset (DeepFashion) have demonstrated that our approaches can improve the accuracy by about 20% when compared to base architectures. It is worth highlighting that with Dynamiclayers we have gained 35% accuracy for the task of multi-class multilabeled classification compared to the other model.
As part of the implementation of our fashion and style recommendation framework described in  we have the task of classifying an image into a large space of possible fashion brands, clothing categories, sub-categories and style attributes. For example: Dress (category) – Casual (sub-category) – Satin, Floral (Attributes) from Zara (Brand)1. Given that there might be more than one clothing item in the image, the task requires a multi-class multi-labeled classification. http://www.aleeshainstitute.com/ To tackle these challenges, dynamic models need to be introduced to handle different types of clothing categories. Recently, deep learning frameworks such as Pytorch2 and Tensorflow Fold3 have started to support Dynamic Computational Graphs (DCGs), which are also known as ”define by run” graphs. With DCG support, a deep learning architecture graph can be defined at runtime, and then backpropagation can use the dynamically built graph . This simplifies the implementation of deeplearning models that operate over data of varying size or data that have a different structure based on input. Introducing dynamic computational graphs can open up the way to many novel inference algorithms to speed up the training by taking advantage of these flexible models. In our solution, we have multiple ”sub-categories” and ”attributes” layers, each related to a certain clothing category, and different categories might have different attributes. Having strong hints from text analysis about the possible categories, sub-categories and attributes with percentages can help us to choose the possible sub-category and attributes’ classification layers to connect to dynamically according to the text. In our work, we present DynamicCNN: a novel application of DCGs using text mining. Deciding the size of the neural network can greatly affect the network generalization and inference speed. Large systems might learn very slowly and might be sensitive to the initial parameters. Neural pruning is one possible way to control the number of redundant network parameters or feature maps that have less contribution to the prediction’s output . This consequently affects the network size and possibly the inference speed. Finding a dynamic approach to decide the parameters that are more or less important for solving certain tasks can greatly help in this context. Usually, pruning identifies the unnecessary parameters by calculating the sensitivity of the error to their removal, but in our case we have used the detected text from our text analysis in order to control dynamically the parameter connections that can be deactivated for certain inputs (our DynamicPruning model). Continuing on the clothing classification example: instead of having more than one sub-category or attribute layers, we experimented having one large sub-category layer with all values from all possible sub-categories, and another layer that combines all possible attributes from different clothing categories. Then, we developed a dynamic approach to dynamically deactivate the parameter connections that are irrelevant to the detected category in the image. This process is done dynamically based on the detected category in the image. Our contributions in this paper are two novel techniques (DynamicPruning and DynamicLayers) to incorporate the social media textual content to support the visual classification in neural networks in a dynamic way. We evaluated our models on two large fashion-specialised datasets. Our experiments demonstrated the improvements achieved by both models in enhancing classification accuracy. In this paper, we evaluate the proposed techniques on basic CNN architectures to highlight the added value of text integration in such model and to pave the way for integrating text support with our proposed architecture described in . Our evaluation of the mentioned architecture will be in comparison to the popular related models such as .