Hello,
I am having trouble extracting the dose and unit information from the below character variable and appreciate any help.
data have:
dose_all |
125 units, single |
200 units |
240 iu, single |
25 units prn im |
? other frequency:q 90 days;? |
100 u |
300 iu, single |
400 units once im |
between 100 to 200 units, single |
Data want (only dose, unit) are most important to extract.
dose_all | dose | unit | x1 | x2 | x3 | x3 |
125 units, single | 125 | units | single | |||
200 units | 200 | units | ||||
240 iu, single | 240 | iu | single | |||
25 units prn im | 25 | units | prn | im | ||
? other frequency:q 90 days;? | ? | other | frequency:q | 90 | days;? | |
100 u | 100 | u | ||||
300 iu, single | 300 | iu | single | |||
400 units once im | 400 | units | once | im | ||
between 100 to 200 units, single | between 100 to 200 | units | single |
Thanks
If dose is the most important..
data have;
input dose_all $50.;
datalines4;
125 units, single
200 units
240 iu, single
25 units prn im
? other frequency:q 90 days;?
100 u
300 iu, single
400 units once im
between 100 to 200 units, single
;;;;
data want;
set have;
dose = substr(dose_all, 1, min(find(dose_all, "?"), prxmatch("/\d(?!.*\d)/", dose_all)));
run;
Thanks for responding. There was an error in the log:
NOTE: Invalid third argument to function SUBSTR at line 566 column 11.
I would just take the beginning of the string up until you see a character that could not be part of a number.
data have;
input dose_all $50.;
cards4;
125 units, single
200 units
240 iu, single
25 units prn im
? other frequency:q 90 days;?
100 u
300 iu, single
400 units once im
between 100 to 200 units, single
1,200 mg
;;;;
data want ;
set have;
loc = verify(dose_all,'0123456789.,');
if loc>1 then do;
dose=input(substr(dose_all,1,loc-1),??comma32.);
units=substr(dose_all,loc);
end;
drop loc;
run;
Obs dose_all dose units 1 125 units, single 125 units, single 2 200 units 200 units 3 240 iu, single 240 iu, single 4 25 units prn im 25 units prn im 5 ? other frequency:q 90 days;? . 6 100 u 100 u 7 300 iu, single 300 iu, single 8 400 units once im 400 units once im 9 between 100 to 200 units, single . 10 1,200 mg 1200 mg
Thanks! It worked but for some observations, as seen below, I am getting a missing value for the extracted dose.
7 injections 25 units each, injected into spinal area
quantity:155 units;other frequency:1 set of injection;?
inject 200 unit every 3 mo intra-muscularly
frequency - 155 units every 3 months i.m?recent
dose quantity: 16, dose unit: units
dose - 400u?frequency q90ds
? quantity:155 iu international unit(s);other route:injection?
Welcome to the world of data cleaning.
Short of getting into natural language processing I am not sure how much more effort you really want to go with trying to automate this.
You might just want to build these special cases into a lookup file (or FORMAT).
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.