Lapis Lazuli | Level 10

Extract the values from the text for calculation

I need to extract the value of drug dosage given in the Morning, Afternoon and Night for the calculation from the scratch set.

Here I given the sample dataset and expected output for your reference.

``````data have;
input Drugs&\$500.;
cards;
Variable_Dose Unit MORNING(V) x90 days Morning 28 unit, Afternoon 0 unit, Night 0 unit
Variable_Dose Unit MORNING(V) , Morning 20 unit, ,
(Mor-Aft) , Morning 15 unit, Afternoon 12 unit, Morning 16 unit, Afternoon 10 unit,
(Mor-Aft) , Morning 015 unit Afternoon 10 unit
(Mor-Aft) , Morning 0 unit, Afternoon 0 unit, Night 15 unit, , , Night 15 unit
(N) , Night 10 unit LAST DOST``````

Expected Output:

 Morning Afternoon Night 28 0 0 20 31 32 15 10 0 0 30 10

(or)

If it a huge process to SUM the unit values, at-least help me to extract the keyword like below as output:

 Col1 Col2 Col3 Col4 Morning 28 unit Afternoon 0 unit Night 0 unit Morning 20 unit Morning 15 unit Afternoon 12 unit Morning 16 unit Afternoon 10 unit Morning 015 unit Afternoon 10 unit Morning 0 unit Afternoon 0 unit Night 15 unit Night 15 unit Night 10 unit

The extraction process should be done with in the Keyword/Syntax : Day Value Unit

Any help will be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Rhodochrosite | Level 12

Re: Extract the values from the text for calculation

Here is one way to solve the problem. The principle is to scan the string for words separated by blanks. If a keyword (Morning, Afternoon, Night) is found, the next "word" is added as value to the corresponding sum variable.

``````data have;
input Drugs&\$500.;
cards;
Variable_Dose Unit MORNING(V) x90 days Morning 28 unit, Afternoon 0 unit, Night 0 unit
Variable_Dose Unit MORNING(V) , Morning 20 unit, ,
(Mor-Aft) , Morning 15 unit, Afternoon 12 unit, Morning 16 unit, Afternoon 10 unit,
(Mor-Aft) , Morning 015 unit Afternoon 10 unit
(Mor-Aft) , Morning 0 unit, Afternoon 0 unit, Night 15 unit, , , Night 15 unit
(N) , Night 10 unit LAST DOST
;
run;

data want(drop=Drugs tmpstr i);
set have;
length tmpstr \$20;
call missing (of Morning Afternoon Night);
do i = 1 to countw(Drugs,' ')-1;
tmpstr = scan(Drugs,i,' ');
if tmpstr = 'Morning' then Morning + input(scan(Drugs,i+1,' '),??8.);
else if tmpstr = 'Afternoon' then Afternoon + input(scan(Drugs,i+1,' '),??8.);
else if tmpstr = 'Night' then Night + input(scan(Drugs,i+1,' '),??8.);
end;
run;``````

4 REPLIES 4
Super User

Re: Extract the values from the text for calculation

Is one of the rules that you did not state that when a record has multiples of morning, afternoon or night that they are SUMMED?

Any other rule(s) not stated?

Lapis Lazuli | Level 10

Re: Extract the values from the text for calculation

I apologize for not stating that point, There is no other rule need to be stated.
Thank you for your response @ballardw
Rhodochrosite | Level 12

Re: Extract the values from the text for calculation

Here is one way to solve the problem. The principle is to scan the string for words separated by blanks. If a keyword (Morning, Afternoon, Night) is found, the next "word" is added as value to the corresponding sum variable.

``````data have;
input Drugs&\$500.;
cards;
Variable_Dose Unit MORNING(V) x90 days Morning 28 unit, Afternoon 0 unit, Night 0 unit
Variable_Dose Unit MORNING(V) , Morning 20 unit, ,
(Mor-Aft) , Morning 15 unit, Afternoon 12 unit, Morning 16 unit, Afternoon 10 unit,
(Mor-Aft) , Morning 015 unit Afternoon 10 unit
(Mor-Aft) , Morning 0 unit, Afternoon 0 unit, Night 15 unit, , , Night 15 unit
(N) , Night 10 unit LAST DOST
;
run;

data want(drop=Drugs tmpstr i);
set have;
length tmpstr \$20;
call missing (of Morning Afternoon Night);
do i = 1 to countw(Drugs,' ')-1;
tmpstr = scan(Drugs,i,' ');
if tmpstr = 'Morning' then Morning + input(scan(Drugs,i+1,' '),??8.);
else if tmpstr = 'Afternoon' then Afternoon + input(scan(Drugs,i+1,' '),??8.);
else if tmpstr = 'Night' then Night + input(scan(Drugs,i+1,' '),??8.);
end;
run;``````

Super User

Re: Extract the values from the text for calculation

```data have;
input Drugs&\$500.;
cards;
Variable_Dose Unit MORNING(V) x90 days Morning 28 unit, Afternoon 0 unit, Night 0 unit
Variable_Dose Unit MORNING(V) , Morning 20 unit, ,
(Mor-Aft) , Morning 15 unit, Afternoon 12 unit, Morning 16 unit, Afternoon 10 unit,
(Mor-Aft) , Morning 015 unit Afternoon 10 unit
(Mor-Aft) , Morning 0 unit, Afternoon 0 unit, Night 15 unit, , , Night 15 unit
(N) , Night 10 unit LAST DOST
;
run;
data temp;
set have;
count=_n_;
pid=prxparse('/\w+\s+\d+\s+unit\b/i');
s=1;e=length(drugs);
call prxnext(pid,s,e,drugs,p,l);
do while(p>0);
var=substr(drugs,p,l);
call prxnext(pid,s,e,drugs,p,l);
output;
end;
drop pid s  e p l ;
run;
proc transpose data=temp out=want;
by count drugs;
var var;
run;```
Discussion stats
• 4 replies
• 617 views
• 1 like
• 4 in conversation