DATA Step, Macro, Functions and more

Interpret text

Posts: 73

Interpret text


I have a large dataset with dosage information as text strings. I am coverting these into numerical information.

Same dosages can be written in many different ways. Is there any way to convert this information to numeric is a "easy" way?

Example of dosages:


1 tablet 1 times daily   (should correspond to the numeric value of 1)

1 TABLET. 1 TIMES DAILY (should correspond to the numeric value of 1)

1 TABL. 1 TIME DAILY (should correspond to the numeric value of 1)

1 tabl daily  (should correspond to the numeric value of 1)

1 TABLET DAILY  (should correspond to the numeric value of 1)



Can you use perl expressions? (No experience on this)


Thank for any help!


Super User
Super User
Posts: 7,392

Re: Interpret text

Why is dose information in a text field, that is madness.  I would immeidately send it back to source and ask them to fix it.  Sure you could try doing some string parsing on the data, but the sheer number of permutations makes it very difficult.  Not to mention that some "doses" will require medical input as to what is what.  In your example, why does 1 tablet daily = 1?  Does it only last one day?  If not then dose is not 1 but cumulation of number of days taken.  

This is a good example of why, at database creation, every care should be taken to avoid free text fields.

If it was me and I had to do this, I would produce a listing of distinct values, send it to the medic/DM, and ask them to provide decodes for each of the values.  Then read that file in and code based on responses.

Posts: 73

Re: Interpret text

Thank you RW9!

I agree that it is of course a bad idea having this information as string. But unfortunately that is how it is stored now. I have manually evaluated all dosages and transformed the into numeric. I am asking now since I would like to evaluate the perfomance of string parsing (or something else) compared to my manual evaluation. In my example 1 tablet daily was corresponding to the daily numeric dose of 1 but it says nothing about the duration of treatment.




Valued Guide
Posts: 797

Re: Interpret text

I guess you could write code to try to standardize the text (i.e. covert "tabl" to "tablet", and "per day" to "daily").  Then parse the standardized code.  But I think you should consiider using the results of your  manual effort to construct a lookup table, which you can use on new data.  That would leave only never-seen-before text to parse (and ultimately add to the lookup table).

Trusted Advisor
Posts: 1,353

Re: Interpret text

[ Edited ]

@bollibompa - you gave only one set of variations - set for tablets per day.

What other sets of text contents are available ?


I would work through next steps:

1)  use function lowcase()  

2)  use function tranw() to replace variant words into standard keywords

3)  use function findw() or index() to search for the keywords  

      and when found search for numeric amount 




Ask a Question
Discussion stats
  • 4 replies
  • 4 in conversation