BookmarkSubscribeRSS Feed
ama220
Obsidian | Level 7

Hello,

I am having trouble extracting the dose and unit information from the below character variable and appreciate any help.

 

data have:

dose_all
125 units, single
200 units
240 iu, single
25 units prn im
? other frequency:q 90 days;?
100 u
300 iu, single
400 units once im
between 100 to 200 units, single

 

Data want (only dose, unit) are most important to extract. 

dose_alldoseunit x1x2x3x3
125 units, single125unitssingle   
200 units200units    
240 iu, single240iusingle   
25 units prn im25unitsprnim  
? other frequency:q 90 days;??otherfrequency:q90days;? 
100 u100u    
300 iu, single300iusingle   
400 units once im400unitsonceim  
between 100 to 200 units, singlebetween 100 to 200unitssingle   

 

Thanks

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

If dose is the most important..

 

data have;
input dose_all $50.;
datalines4;
125 units, single               
200 units                       
240 iu, single                  
25 units prn im                 
? other frequency:q 90 days;?   
100 u                           
300 iu, single                  
400 units once im               
between 100 to 200 units, single
;;;;

data want;
   set have;
   dose = substr(dose_all, 1, min(find(dose_all, "?"), prxmatch("/\d(?!.*\d)/", dose_all)));
run;
ama220
Obsidian | Level 7

Thanks for responding. There was an error in the log: 
NOTE: Invalid third argument to function SUBSTR at line 566 column 11. 

Tom
Super User Tom
Super User

I would just take the beginning of the string up until you see a character that could not be part of a number.

data have;
  input dose_all $50.;
cards4;
125 units, single
200 units
240 iu, single
25 units prn im
? other frequency:q 90 days;?
100 u
300 iu, single
400 units once im
between 100 to 200 units, single
1,200 mg
;;;;

data want ;
  set have;
  loc = verify(dose_all,'0123456789.,');
  if loc>1 then do;
     dose=input(substr(dose_all,1,loc-1),??comma32.);
     units=substr(dose_all,loc);
  end;
  drop loc;
run;
Obs    dose_all                            dose    units

  1    125 units, single                    125    units, single
  2    200 units                            200    units
  3    240 iu, single                       240    iu, single
  4    25 units prn im                       25    units prn im
  5    ? other frequency:q 90 days;?          .
  6    100 u                                100    u
  7    300 iu, single                       300    iu, single
  8    400 units once im                    400    units once im
  9    between 100 to 200 units, single       .
 10    1,200 mg                            1200    mg
ama220
Obsidian | Level 7

Thanks! It worked but for some observations, as seen below, I am getting a missing value for the extracted dose.

7 injections 25 units each, injected into spinal area

quantity:155 units;other frequency:1 set of injection;?

inject 200 unit every 3 mo intra-muscularly

frequency - 155 units every 3 months i.m?recent

dose quantity: 16, dose unit: units

dose - 400u?frequency q90ds

? quantity:155 iu international unit(s);other route:injection?

Tom
Super User Tom
Super User

Welcome to the world of data cleaning.

Short of getting into natural language processing I am not sure how much more effort you really want to go with trying to automate this.

 

You might just want to build these special cases into a lookup file (or FORMAT).

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1707 views
  • 2 likes
  • 3 in conversation