BookmarkSubscribeRSS Feed
Carbon
Calcite | Level 5

 Hello,

 

I need to create two variables - one with dosage (number) and another with units of measurement (string - mg, mgs, mL or iu) using free text medication records that do not appear to follow any structured pattern and some of them have entries that complicate scanning the digits, for example:

 

DRUG 357860 25mg TAB - [problem: two numbers and only one of them is dosage, another is a part of drug name]

Drug_1/Drug_2 1 liquid 50/8mL - [problem: two numbers for two different drugs separated by /]

Drug 250mgs TAB (250mgs) - [problem: double entry of the same dosage]

Drug 80 iu text - [space between dosage and unit]

 

The predominant format is DRUG NAME 123mg TEXT. The observations that have two drugs with two dosage values would create two variables with two separate dosages - Drug 1 and Drug 2.

 

I tried to use scan and compress, but I don't know how to separate numbers that are part of a drug name from dosage values. Any ideas on how to approach this?

 

I would appreciate any advice. My apologies if I missed similar postings. I tried to search, but couldn't find an exact match. 

 

Thank you,

5 REPLIES 5
error_prone
Barite | Level 11
Do you know all measurement units that can appear in the data? I would start with some regular expressions, like /.*(\d+)mg.*/
With prxposn the contents of the brackets can be extracted.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

From free text it is going to be very hard to extract anything of use.  In circumstances like this it is advisable to get the data coded by a medical professional, as things like the drug, and the <text> part may impact dose, frequency, drug etc.  Simple text matching will not work on this.  It is one reason why most companies setup databases to collect various data points in their own separate fields.  You have given an example of this yourself:

Drug_1/Drug_2 1 liquid 50/8mL

Should this be recorded as Drug combination xyz at y dose, or two separate drugs a differing doses. 

How will you be encoding this further - i.e. to global standard medical dictionaries?

Carbon
Calcite | Level 5

Hello,

 

Thank you very much for getting back to me and your feedback. I can now provide a detailed extract of my data (substring of interest is highlighted):

 

DRUG 10MG TAB (INGREDIENT)

DRUG 10MCG/HRTRANPATCH (INGREDIENT)

DRUG50MCGTAB (INGREDIENT)

DRUG 50/8MG TAB (INGREDIENT, INGREDIENT)

DRUG100/25TAB (INGREDIENT; INGREDIENT)

DRUG13.125GSACH (PWDR) (INGREDIENT 3350; INGREDIENT; INGREDIENT)

DRUG SZ 50MG TAB

DRUG SZ 500MGCAP

DRUG 5MG/ML MIXT (INGREDIENT)

DRUG 20MCG/H

DRUG 3.3G/5ML

DRUG 1.25G BLUE (INGREDIENT BLUE 1.25G TAB)

DRUG D 1000IU CAP

DRUG500MG/5MLTEXT (INGREDIENT INGREDIENT)

DRUG240 240MG/5 EL (INGREDIENT)

DRUG SR-CAP 200MG/25MG (INGREDIENT/INGREDIENT)

DRUG 5/2.5MG TAB (INGREDIENT;INGREDIENT)

DRUG TAB 310/.35MG (INGREDIENT; INGREDIENT)

DRUG MR-TAB 100MG 100MG MR-TAB (INGREDIENT)

DRUG 5/5 5MG/5MG TAB (INGREDIENT/INGREDIENT)

DRUG 600MG/200IU TAB (INGREDIENT & INGREDIENT)

 

The units of measurements are determined as g, mg, mcg, ml and iu. In terms of issue with two drugs, there is a variable that identifies two generic components of medication (INGREDIENT) using the ATC code. I just need to extract the corresponding dosage values and units. 

 

Could you please give me an example of using prxposn based on my data. I understand that it also involves prxparse and prxmatch, but have very sketchy understanding of these commands. 

 

Thank you. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, unfortunately there is no one pattern for programming this.  You will need to set out a whole series of possibilities from, is scan(2) a dose, is position of first number a dose etc.

Then there is as I mentioned previously a judgement call on the data, say you have an aspirin 100mg tab combination, should the dose be the tablet dose, or the aspirin part of the tablet - just an example I made up - this is why i recommend you get medical review on the data.

Carbon
Calcite | Level 5

Hello,

 

Thank you kindly for your time to reply to my question. Much appreciated. 

Best regards,

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2283 views
  • 1 like
  • 3 in conversation