Lima Vallantin
Marketing Data scientist and Master's student interested in everything concerning Data, Text Mining, and Natural Language Processing. Currently speaking Brazilian Portuguese, French, English, and a tiiiiiiiiny bit of German. Want to connect? Tu peux m'envoyer un message. Pour plus d'informations sur moi, tu peux visiter cette page.


N'oublies pas de partager :

Partager sur linkedin
Partager sur twitter
Partager sur facebook

N'oublies pas de partager :

Partager sur linkedin
Partager sur twitter
Partager sur whatsapp
Partager sur facebook

What’s text mining? Try to recall that time when you were a baby, think about how was your first contact with the language.

It was surely through the spoken word. Then, you went to school and learned how to read. Words are so natural to us that we use to forget that most of the available data is actually encoded in textual form. So, we need a way to access this packaged knowledge. That’s where text mining comes in.

What’s text mining?

Text mining, text processing, and text analytics are more or less the same thing. We use these terms to describe the semi-automated process for knowledge extraction from unstructured textual data sources.

We usually say that information contained in a text is unstructured. An example of structured information is a spreadsheet, where data is organized in a nice and beautiful way.

In order to access data in an unstructured format, you need a methodology such as text mining. I call it a methodology because text analytics is more of a strategy. You can use tools such as machine learning models, Traitement du langage naturel (NLP) tools, Named-Entity-Recognition (NER), Part-of-speech (POS) tagging, etc.

In a nutshell, text mining and text analytics are the tasks of going through different documents grouped into a corpus to discover hidden and useful information.

What is the difference between text mining and text analytics?

In his course, Text Mining and Analytics, ChengXiang Zhai explains that there’s no practical difference between them, but some people like to use text mining when they relate to the practical process of getting and processing the data, while text analytics would be more related to the analysis of the data.

According to Ignatow and Mihalcea​1​, text mining has its roots in computer science, while text analysis is related to social science.

In general, the differences between both disciplines are described in the following way:

le Text Mining 

The process of collecting and preparing textual data. It makes use of the Extract Transform Load (ETL) framework and relies on programming languages such as R and Python for these tasks. It’s related to the extraction of raw data and its normalization.

Text Analytics

Makes use of statistical and machine learning models to extract insights from text. Can make use of the same programming languages as text mining, but also of analytics tools like PowerBi. It’s related to the analysis of the normalized data.

What are text mining and text analytics applications?

Text mining and text analysis can be used to get high-quality information from a text to understand how language is used and how the world itself organizes around language.

The textual analysis provides us useful information about the person who wrote or said something – especially in the case of social media.

Since it’s a computer science method, text mining can deal with large scale text extraction and treatment. This allows companies to extract business insights from internal documents and exchanges made with customers by email, social media, or chatbots.

For the marketing industry, it’s possible to use text mining and text analytics for sentiment detection and customer service management. According to an article written on the site MonkeyLearn​2​, text analytics can be used to automatically tag tickets, language detection, etc.

In education, Oliveira, Azevedo, and Gomes​3​ discuss the use of data mining tools to evaluate a student’s production and help teachers to identify the gaps between what’s being taught and assimilated content.

What’s text mining: the tools

The main tool to analyze text will be a programming language such as Python or R. You can use libraries such as Numpy, Pandas, NLTK, Gensim, Spacy, Scikit-Learn, BeautifulSoup, Scrapy, Pytorch, Tensorflow, and many others.

You can also use more visual tools to analyze text, such as RapidMiner, Weka, or Power Bi.

How to learn text mining?

You can start by taking some courses such as the ones below:

Or read the articles below:


  1. 1.
    Ignatow G, Mihalcea R. An Introduction to Text Mining: Research Design, Data Collection, and Analysis. SAGE Publications; 2017.
  2. 2.
    What Is Text Mining? A Beginner’s Guide. What Is Text Mining? A Beginner’s Guide. Accessed January 26, 2021.,based%20on%20their%20previous%20experience.
  3. 3.
    Gonçalves de Oliveira L, Terra Azevedo BF, Basílio Almeida Gomes C. Softwares de mineração de texto na análise de produções textuais de estudantes do ensino fundamental: possibilidades interdisciplinares. 752. Accessed January 28, 2021.

N'oublies pas de partager :

Partager sur linkedin
Partager sur twitter
Partager sur whatsapp
Partager sur facebook

Laisser un commentaire