What’s text mining? Try to recall that time when you were a baby, think about how was your first contact with the language.
It was surely through the spoken word. Then, you went to school and learned how to read. Words are so natural to us that we use to forget that most of the available data is actually encoded in textual form. So, we need a way to access this packaged knowledge. That’s where text mining comes in.
What’s text mining?
Text mining, text processing, and text analytics are more or less the same thing. We use these terms to describe the semi-automated process for knowledge extraction from unstructured textual data sources.
We usually say that information contained in a text is unstructured. An example of structured information is a spreadsheet, where data is organized in a nice and beautiful way.
In order to access data in an unstructured format, you need a methodology such as text mining. I call it a methodology because text analytics is more of a strategy. You can use tools such as machine learning models, Natural Language Processing (NLP) tools, Named-Entity-Recognition (NER), Part-of-speech (POS) tagging, etc.
In a nutshell, text mining and text analytics are the tasks of going through different documents grouped into a corpus to discover hidden and useful information.
What is the difference between text mining and text analytics?
In his course, Text Mining and Analytics, ChengXiang Zhai explains that there’s no practical difference between them, but some people like to use text mining when they relate to the practical process of getting and processing the data, while text analytics would be more related to the analysis of the data.
According to Ignatow and Mihalcea1, text mining has its roots in computer science, while text analysis is related to social science.
In general, the differences between both disciplines are described in the following way:
The process of collecting and preparing textual data. It makes use of the Extract Transform Load (ETL) framework and relies on programming languages such as R and Python for these tasks. It’s related to the extraction of raw data and its normalization.
Makes use of statistical and machine learning models to extract insights from text. Can make use of the same programming languages as text mining, but also of analytics tools like PowerBi. It’s related to the analysis of the normalized data.
What are text mining and text analytics applications?
Text mining and text analysis can be used to get high-quality information from a text to understand how language is used and how the world itself organizes around language.
The textual analysis provides us useful information about the person who wrote or said something – especially in the case of social media.
Since it’s a computer science method, text mining can deal with large scale text extraction and treatment. This allows companies to extract business insights from internal documents and exchanges made with customers by email, social media, or chatbots.
For the marketing industry, it’s possible to use text mining and text analytics for sentiment detection and customer service management. According to an article written on the site MonkeyLearn2, text analytics can be used to automatically tag tickets, language detection, etc.
In education, Oliveira, Azevedo, and Gomes3 discuss the use of data mining tools to evaluate a student’s production and help teachers to identify the gaps between what’s being taught and assimilated content.
What’s text mining: the tools
The main tool to analyze text will be a programming language such as Python or R. You can use libraries such as Numpy, Pandas, NLTK, Gensim, Spacy, Scikit-Learn, BeautifulSoup, Scrapy, Pytorch, Tensorflow, and many others.
You can also use more visual tools to analyze text, such as RapidMiner, Weka, or Power Bi.
How to learn text mining?
You can start by taking some courses such as the ones below:
- Text Mining and Analytics
- Applied Text Mining in Python
- Text retrieval and search engines
- Hands-on Text Mining and Analytics
Or read the articles below:
- What Is Text Mining? A Beginner’s Guide
- How to Get Started with Deep Learning for Natural Language Processing
- 1.Ignatow G, Mihalcea R. An Introduction to Text Mining: Research Design, Data Collection, and Analysis. SAGE Publications; 2017.
- 2.What Is Text Mining? A Beginner’s Guide. What Is Text Mining? A Beginner’s Guide. Accessed January 26, 2021. https://monkeylearn.com/text-mining/#:~:text=Text%20analytics%20is%20usually%20used,based%20on%20their%20previous%20experience.
- 3.Gonçalves de Oliveira L, Terra Azevedo BF, Basílio Almeida Gomes C. Softwares de mineração de texto na análise de produções textuais de estudantes do ensino fundamental: possibilidades interdisciplinares. 752. Accessed January 28, 2021. https://seer.faccat.br/index.php/redin/article/viewFile/1108/752