Hello,
I am having trouble extracting the dose and unit information from the below character variable and appreciate any help.
data have:
| dose_all |
| 125 units, single |
| 200 units |
| 240 iu, single |
| 25 units prn im |
| ? other frequency:q 90 days;? |
| 100 u |
| 300 iu, single |
| 400 units once im |
| between 100 to 200 units, single |
Data want (only dose, unit) are most important to extract.
| dose_all | dose | unit | x1 | x2 | x3 | x3 |
| 125 units, single | 125 | units | single | |||
| 200 units | 200 | units | ||||
| 240 iu, single | 240 | iu | single | |||
| 25 units prn im | 25 | units | prn | im | ||
| ? other frequency:q 90 days;? | ? | other | frequency:q | 90 | days;? | |
| 100 u | 100 | u | ||||
| 300 iu, single | 300 | iu | single | |||
| 400 units once im | 400 | units | once | im | ||
| between 100 to 200 units, single | between 100 to 200 | units | single |
Thanks
If dose is the most important..
data have;
input dose_all $50.;
datalines4;
125 units, single
200 units
240 iu, single
25 units prn im
? other frequency:q 90 days;?
100 u
300 iu, single
400 units once im
between 100 to 200 units, single
;;;;
data want;
set have;
dose = substr(dose_all, 1, min(find(dose_all, "?"), prxmatch("/\d(?!.*\d)/", dose_all)));
run;
Thanks for responding. There was an error in the log:
NOTE: Invalid third argument to function SUBSTR at line 566 column 11.
I would just take the beginning of the string up until you see a character that could not be part of a number.
data have;
input dose_all $50.;
cards4;
125 units, single
200 units
240 iu, single
25 units prn im
? other frequency:q 90 days;?
100 u
300 iu, single
400 units once im
between 100 to 200 units, single
1,200 mg
;;;;
data want ;
set have;
loc = verify(dose_all,'0123456789.,');
if loc>1 then do;
dose=input(substr(dose_all,1,loc-1),??comma32.);
units=substr(dose_all,loc);
end;
drop loc;
run;
Obs dose_all dose units 1 125 units, single 125 units, single 2 200 units 200 units 3 240 iu, single 240 iu, single 4 25 units prn im 25 units prn im 5 ? other frequency:q 90 days;? . 6 100 u 100 u 7 300 iu, single 300 iu, single 8 400 units once im 400 units once im 9 between 100 to 200 units, single . 10 1,200 mg 1200 mg
Thanks! It worked but for some observations, as seen below, I am getting a missing value for the extracted dose.
7 injections 25 units each, injected into spinal area
quantity:155 units;other frequency:1 set of injection;?
inject 200 unit every 3 mo intra-muscularly
frequency - 155 units every 3 months i.m?recent
dose quantity: 16, dose unit: units
dose - 400u?frequency q90ds
? quantity:155 iu international unit(s);other route:injection?
Welcome to the world of data cleaning.
Short of getting into natural language processing I am not sure how much more effort you really want to go with trying to automate this.
You might just want to build these special cases into a lookup file (or FORMAT).
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.