Medical NLP: using Spacy, Docker, and Azure for Named Entity Recognition

Lima Vallantin
Wilame
Marketing Data scientist and Master's student interested in everything concerning Data, Text Mining, and Natural Language Processing. Currently speaking Brazilian Portuguese, French, English, and a tiiiiiiiiny bit of German. Want to connect? Tu peux m'envoyer un message. Pour plus d'informations sur moi, tu peux visiter cette page.

Sommaire

N'oublies pas de partager :

Partager sur linkedin
Partager sur twitter
Partager sur facebook

N'oublies pas de partager :

Partager sur linkedin
Partager sur twitter
Partager sur whatsapp
Partager sur facebook

Today’s post will be very short since it’s more a showcase of an ongoing project than a tutorial properly. For a few weeks now, I have been collecting and labeling medical text posted on social media. I have found some great channels on Reddit where people can share their symptoms and diseases in order to get medical advice (I am not collecting the advice per se, only the descriptions for each condition).

These texts are very rich in details and may provide very interesting insights about diseases and their effects. But you may also find information about drug side-effects, reactions to medication, etc. One example is the phrase below. Let’s examine it.

“My mom (72) is having symptoms of Covid. Her saturation fell and we got her hospitalized. She had fever.”

Which entities can you spot on this text? Maybe there’s a mention of a disease? Or of a symptom? Or even of a medical jargon? My goal was to train a model to identify mentions to pathologies, body parts, and other medical terms on a phrase.

After training the model, I wanted to use it in production. I embedded a Flask app on a Docker container and used Azure to serve the app. If you want to reproduce the same experience, feel free to clone my repository with the model and the dockerfile to generate your own image.

Clone the repository and create the Docker image

I have pushed all the files necessary to build this application to the nerflask repository. The name is not very original, but it says what it does :P.

After cloning this repo, build a docker image with the command:

$ docker build -f dockerfile --tag medicalner .

Then, if you want to test it locally, run:

docker run --name medicalner -p 43656:5000 -t -i medicalner:latest

Change 43656 for any port that you would like that your application to listen to. By the way, no need to say that you have to have Docker installed on your computer to do all this…

When it’s running, open a browser (or use any other querying method) and type:

http://127.0.0.1:43656/predict?t=My mom (72) is having symptoms of Covid. Her saturation fell and we got her hospitalized. She had fever.

Hit enter and you should get the following result:

{
   "text": "My mom (72) is having symptoms of Covid. Her saturation fell and we got her hospitalized. She had fever.", 
   "predictions": [
     {
       "token": "72", 
       "start": 8, 
       "end": 10, 
       "entity": "AGE"
     }, 
     {
       "token": "symptoms", 
       "start": 22, 
       "end": 30, 
       "entity": "JARGON"
     }, 
     {
       "token": "covid", 
       "start": 34, 
       "end": 39, 
       "entity": "DISEASE"
     }, 
     {
       "token": "saturation fell", 
       "start": 4, 
       "end": 19, 
       "entity": "DISEASE"
     }, 
     {
       "token": "hospitalized", 
       "start": 35, 
       "end": 47, 
       "entity": "JARGON"
     }, 
     {
       "token": "fever", 
       "start": 8, 
       "end": 13, 
       "entity": "SYMPTOM"
     }
   ]
 }

I have also created a notebook so you can test your own text. Click to open the interactive version on binder.

Send it to Azure

This part is very long and very well documented, so I won’t spend a long time here. If you want to deploy a Docker image to Azure, there’s a very straightforward way, which is using VS Code and the Azure and Docker extensions.

This tutorial shows how you can do it, step by step: Tutorial: Deploy Docker containers to Azure App Service with Visual Studio Code.

That’s all! Now you can extract medical entities from text 😀

My small contribution to the medical NLP field

Named Entity Recognition is a nice way to extract and structure data. It’s also one of the most popular NLP tasks. It’s no secret to no one that health data is very personal and should be used carefully. This is why it is, sometimes, so hard to find good datasets for healthcare NLP.

This project is my small contribution to the community. I hope that by building more domain datasets we can improve the medical field and give more quality of life to patients and their families.

N'oublies pas de partager :

Partager sur linkedin
Partager sur twitter
Partager sur whatsapp
Partager sur facebook

Laisser un commentaire