BookmarkSubscribeRSS Feed
bollibompa
Quartz | Level 8

Hi,

I have a large dataset with dosage information as text strings. I am coverting these into numerical information.

Same dosages can be written in many different ways. Is there any way to convert this information to numeric is a "easy" way?

Example of dosages:

 

1 tablet 1 times daily   (should correspond to the numeric value of 1)

1 TABLET. 1 TIMES DAILY (should correspond to the numeric value of 1)

1 TABL. 1 TIME DAILY (should correspond to the numeric value of 1)

1 tabl daily  (should correspond to the numeric value of 1)

1 TABLET DAILY  (should correspond to the numeric value of 1)

etc.

 

Can you use perl expressions? (No experience on this)

 

Thank for any help!

Thomas

4 REPLIES 4
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Why is dose information in a text field, that is madness.  I would immeidately send it back to source and ask them to fix it.  Sure you could try doing some string parsing on the data, but the sheer number of permutations makes it very difficult.  Not to mention that some "doses" will require medical input as to what is what.  In your example, why does 1 tablet daily = 1?  Does it only last one day?  If not then dose is not 1 but cumulation of number of days taken.  

This is a good example of why, at database creation, every care should be taken to avoid free text fields.

If it was me and I had to do this, I would produce a listing of distinct values, send it to the medic/DM, and ask them to provide decodes for each of the values.  Then read that file in and code based on responses.

bollibompa
Quartz | Level 8

Thank you RW9!

I agree that it is of course a bad idea having this information as string. But unfortunately that is how it is stored now. I have manually evaluated all dosages and transformed the into numeric. I am asking now since I would like to evaluate the perfomance of string parsing (or something else) compared to my manual evaluation. In my example 1 tablet daily was corresponding to the daily numeric dose of 1 but it says nothing about the duration of treatment.

 

/Thomas

 

mkeintz
PROC Star

I guess you could write code to try to standardize the text (i.e. covert "tabl" to "tablet", and "per day" to "daily").  Then parse the standardized code.  But I think you should consiider using the results of your  manual effort to construct a lookup table, which you can use on new data.  That would leave only never-seen-before text to parse (and ultimately add to the lookup table).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Shmuel
Garnet | Level 18

@bollibompa - you gave only one set of variations - set for tablets per day.

What other sets of text contents are available ?

 

I would work through next steps:

1)  use function lowcase()  

2)  use function tranw() to replace variant words into standard keywords

3)  use function findw() or index() to search for the keywords  

      and when found search for numeric amount 

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 771 views
  • 0 likes
  • 4 in conversation