Development of the full ETL process for Essencialia.com. I scrape, daily, real estate listings’ sites using Selenium and Python to extract unstructured real estate data for the city of Aracaju (Brazil).
Data is preprocessed, cleaned and loaded into a Postgres database.
Skills & tools: Selenium, Python, Postgre, SQL, BeautifulSoup, Pandas, Statistics.
I have been using Natural Language Processing for text mining and for extract useful information from data.
For Essencialia, I use techniques such as POS-Tagging (to extract locales from text), Sentiment Analysis (to look for news articles about violent districts) and parsing techniques to clean and process text.
Skills & tools: Spacy, Python, Regex, scikit-learn, Machine Learning, Natural Language Processing, Aruana & Atalaia (proprietary Python libraries that I have created for text processing).
All the Real Estate findings are presented on Essencialia.com. I use a mix of Tableau embeddings and HTML to present the information in a fun and interactive way.
Data is analysed and transferred to a WordPress website using PHP. During the exploratory analysis step, I use Matplotlib, Pandas, SQL, and Python to explore data.