Keep building nice things

Portfolio

I can talk about what I can do.
Or I can just show you.

Clinical NLP

I have built a dataset using the Reddit API made of descriptions from real people about their diseases and symptoms. The result is a NER model that you can use to extract medical entities from text.

Clone the repositorycheck the model in action, and read more about this project.

Skills & tools: Python, Rest API, SQL, Pandas, Spacy, Hugging face Transformers, Docker, Azure, data mining, and data cleaning.

Medical NER: extracting named entities with Spacy, Docker and Azure.

Smarter cities

I have been using Natural Language Processing for text mining and for extract useful information from data.

For Essencialia, I use techniques such as POS-Tagging (to extract locales from text), Sentiment Analysis (to look for news articles about violent districts) and parsing techniques to clean and process text. 

Skills & tools: Spacy, Python, Regex, scikit-learn, Machine Learning, Natural Language Processing, Aruana & Atalaia (proprietary Python libraries that I have created for text processing). 

Example of tagged text for Spacy

Data Mining

Development of the full ETL process for Essencialia.com. I scrape, daily, real estate listings’ sites using Selenium and Python to extract unstructured real estate data for the city of Aracaju (Brazil).

Data is preprocessed, cleaned and loaded into a Postgres database.

Skills & tools: Selenium, Python, Postgre, SQL, BeautifulSoup, Pandas, Statistics.

Essencialia.com - Printscreen

Computer Vision

Computers can see. And the consequences and opportunities are huge. In 2019, I developed this small program to anonymize faces in real-time videos thinking about data privacy.

As a Data Scientist, I will always work to mitigate the bad consequences of using AI. This means no assassin drones, no voice assistants hearing someone’s private conversations, and no malicious use of AI.

Ethics is not a choice. It’s a responsibility towards society.

Skills & tools: OpenCV, Computer Vision, Face detection, Data Privacy.

Data Visualization and Web Development

All the Real Estate findings are presented on Essencialia.com. I use a mix of Tableau embeddings and HTML to present the  information in a fun and interactive way. 

Data is analysed and transferred to a WordPress website using PHP. During the exploratory analysis step, I use Matplotlib, Pandas, SQL, and Python to explore data. 

Skills & Tools: Tableau, Data visualization, SQL, Pandas, Matplotlib, Python, WordPress, HTML, CSS, PHP, Javascript, Translatrepress, Elementor.

More projects

Blogging

Blogging is an essential part of communicating data and sharing knowledge. I blog and develop free content and tutorials to help other data scientists.

Development

Atalaia is a personal NLP library that I use to prototype. You can fork the latest public available version by clicking here.