For the last days, we have been working into building our own dataset. Today, is time to do something interesting with it and transforming it into a graph.
Graphs are a great ways to discover relationships, but not every data can be represented in this way.
Imagine that you have an e-commerce site with thousands of products. A nice way to discover how similar a product is to another is using graphs.
Let’s take the fashion industry example. I have a dataset here with information about fabrics, who design them, the name of the company that manufactured it, the collection, the composition, if they are certified, the type of cloth you can make with it, the colors and patterns etc.
I have transformed all this into a graph and this is the result (the whole network is huge, so some nodes and edges are hidden for better visualisation).
Here, you can see that the fabric “Turo Tweed” is made of tweed and wool blends and can be used to make coats and trousers. It was designed by Turo Fabrics UK, who also produces the fabric “Superior Lining”, which is also a great fit for coats and jackets.
This second cluster has four fabrics with solid pattern, but with different applications. The “Solid Laguna Jersey” and the “Solid Laguna Jersey (Chocolate)” are great for designing leggings and are part of the same collection. The “Westerly Natural White” and the Sweater Weight Wool Jersey are manufactured by the same company, but have different patterns.
Until now, we just analysed a fraction of the network. This is the full picture:
So, a few tips to do this challenge:
- Ask yourself which relationships exist in your data.
- Pre-process your data so these relationships are expressed into a “Subject -> verb -> object” format.
- Transform these into a Graph using a package like NetworkX.
- Use a visualisation software (I am using Cytoscape) to explore your network.
A great place to start is reading the article “Python NLP Tutorial: Building A Knowledge Graph using Python and SpaCy” written by Marius Borcan.
During the next days, I will explore data for at least 1 hour per day and post the notebooks, data and models, when they are available, to this repository.
Do you want to connect? It will be a pleasure to discuss Machine Learning with you. Drop me a message on LinkedIn.