How to use graphs to better grasp data?

#100daysofdata

Have you already considered building graphs from a dataset that you own? Graphs are a great way to discover relationships, but not every data can be represented in this way. Imagine that you have an e-commerce site with thousands of products. A nice way to discover how similar a product is to another is using […]

Explore the dataset you created

#100daysofdata

Today’s challenge is a continuation of yesterday’s task. If you have created a database, even if it is small, today will be the time to explore it. Again, since every project is different and you may choose to build a dataset of basically anything, there’s no notebook to share today 🙁 From my side, I […]

Create a dataset

#100daysofdata

Today’s challenge is about creating your own dataset. I have already discussed how important is to understand this topic: since a lot of introductory Machine Learning examples use toy datasets, we don’t always have the opportunity to understand the issues of creating our own dataset. Since every project is different and you may choose to […]

NLP – Fetch data using rss feeds

#100daysofdata

Getting data for textual analysis should not be limited to scrapping websites. There are many ways to get valuable information from websites, like fetching data from rss feeds. But rss is not that easy to find anymore. Mostly because, well… who reads it? I do… I still use solutions like Feedly to ready articles, but […]

Clustering text to find different themes

#100daysofdata

Today’s problem is a continuation of yesterday’s. Yesterday, I used Gensim to create a summary of some internet articles in order to create a business plan. However, sometimes you can come up with a lot of text. Some of they will make sense and some others will be garbage. I thought I could use clustering […]

Extractive text summarization: creating a business plan using this technique

#100daysofdata

Last week, after personal reflection and some feedback received, I decided to change the format of this challenge. I came to the conclusion that it would be more enjoyable to have real examples of the application of data manipulation, rather than just exemplifying the capabilities of some tools. So, today, let me explore how I […]

Transfer learning (and a reflexion about this 100 days challenge)

#100daysofdata

Transfer learning is a way to reuse already trained models to increase the performance of a new model being trained. Today, we will explore this concept. BUT FIRST, it’s time for a brief reflexion about this challenge. When I decided to start the #100DaysOfTensorflow challenge, I had two main goals: to discover features about this […]

Ensemble modelling

#100daysoftensorflow

Sometimes, it’s hard to get good predictions with a single model. In these cases, you can train different models with different architectures and submit your predictions to all of them. During the next days, I will explore Tensorflow for at least 1 hour per day and post the notebooks, data and models to this repository. […]

Retrain model

#100daysoftensorflow

For today’s challenge, we won’t many different changes from the previous ones we have already made. Let’s try one last thing with our corpus before retraining the model. During the next days, I will explore Tensorflow for at least 1 hour per day and post the notebooks, data and models to this repository. Today’s notebook […]

More exploratory analysis (NLP) Part 3

#100daysoftensorflow

For today’s challenge, we will continue to navigate through our data using exploratory techniques. Textual data is so rich that it worths this kind of in deep analysis. During the next days, I will explore Tensorflow for at least 1 hour per day and post the notebooks, data and models to this repository. Today’s notebook […]