BookmarkSubscribeRSS Feed
ama220
Obsidian | Level 7

Hello,

I am having trouble extracting the dose and unit information from the below character variable and appreciate any help.

 

data have:

dose_all
125 units, single
200 units
240 iu, single
25 units prn im
? other frequency:q 90 days;?
100 u
300 iu, single
400 units once im
between 100 to 200 units, single

 

Data want (only dose, unit) are most important to extract. 

dose_alldoseunit x1x2x3x3
125 units, single125unitssingle   
200 units200units    
240 iu, single240iusingle   
25 units prn im25unitsprnim  
? other frequency:q 90 days;??otherfrequency:q90days;? 
100 u100u    
300 iu, single300iusingle   
400 units once im400unitsonceim  
between 100 to 200 units, singlebetween 100 to 200unitssingle   

 

Thanks

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

If dose is the most important..

 

data have;
input dose_all $50.;
datalines4;
125 units, single               
200 units                       
240 iu, single                  
25 units prn im                 
? other frequency:q 90 days;?   
100 u                           
300 iu, single                  
400 units once im               
between 100 to 200 units, single
;;;;

data want;
   set have;
   dose = substr(dose_all, 1, min(find(dose_all, "?"), prxmatch("/\d(?!.*\d)/", dose_all)));
run;
ama220
Obsidian | Level 7

Thanks for responding. There was an error in the log: 
NOTE: Invalid third argument to function SUBSTR at line 566 column 11. 

Tom
Super User Tom
Super User

I would just take the beginning of the string up until you see a character that could not be part of a number.

data have;
  input dose_all $50.;
cards4;
125 units, single
200 units
240 iu, single
25 units prn im
? other frequency:q 90 days;?
100 u
300 iu, single
400 units once im
between 100 to 200 units, single
1,200 mg
;;;;

data want ;
  set have;
  loc = verify(dose_all,'0123456789.,');
  if loc>1 then do;
     dose=input(substr(dose_all,1,loc-1),??comma32.);
     units=substr(dose_all,loc);
  end;
  drop loc;
run;
Obs    dose_all                            dose    units

  1    125 units, single                    125    units, single
  2    200 units                            200    units
  3    240 iu, single                       240    iu, single
  4    25 units prn im                       25    units prn im
  5    ? other frequency:q 90 days;?          .
  6    100 u                                100    u
  7    300 iu, single                       300    iu, single
  8    400 units once im                    400    units once im
  9    between 100 to 200 units, single       .
 10    1,200 mg                            1200    mg
ama220
Obsidian | Level 7

Thanks! It worked but for some observations, as seen below, I am getting a missing value for the extracted dose.

7 injections 25 units each, injected into spinal area

quantity:155 units;other frequency:1 set of injection;?

inject 200 unit every 3 mo intra-muscularly

frequency - 155 units every 3 months i.m?recent

dose quantity: 16, dose unit: units

dose - 400u?frequency q90ds

? quantity:155 iu international unit(s);other route:injection?

Tom
Super User Tom
Super User

Welcome to the world of data cleaning.

Short of getting into natural language processing I am not sure how much more effort you really want to go with trying to automate this.

 

You might just want to build these special cases into a lookup file (or FORMAT).

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 639 views
  • 2 likes
  • 3 in conversation