I want to make the days listed in the "Comments" column as it's own column. I would prefer each day to be it's own row.
Example: If COMMENTS = "No assessment made on Day 16, Day 17, Day 19" then I would extract DAYS = 16, 17, 19
Something like this may work:
data want;
set have;
prxid=prxparse('/\bDay\s+\d+\b/i');
start=1;
end=length(comments);
do until(0);
call prxnext(prxid,start,end,comments,pos,len);
if pos=0 then leave;
day=input(scan(substr(comments,pos,len),2),8.0);
output;
end;
drop start end pos len prxid;
run;
The PRX expression looks for a word boundary (\b), followed by "Day" and 1 or more whitespaces (blanks or tab characters) (\s+), 1 or more digits (\d+) and a word boundary again. The "i" at the end means to look for e.g. both "day", "DAY" and "Day".
If you want the DAY as a character variable, just drop the INPUT function (and perhaps declare the DAY variable with e.g "length day $8;" as it will otherwise get the same length as the COMMENTS variable).
@mariko5797 wrote:
I want to make the days listed in the "Comments" column as it's own column. I would prefer each day to be it's own row.
Example: If COMMENTS = "No assessment made on Day 16, Day 17, Day 19" then I would extract DAYS = 16, 17, 19
Do you mean:
Example: If COMMENTS = "No assessment made on Day 16, Day 17, Day 19" then I would extract DAYS = 16 on one row, DAYS = 17 on another row, and DAYS= 19 on another row
so you want days to be a character variable with the days from comment variable? could you post some sample data as datelines and desired output?
Something like this may work:
data want;
set have;
prxid=prxparse('/\bDay\s+\d+\b/i');
start=1;
end=length(comments);
do until(0);
call prxnext(prxid,start,end,comments,pos,len);
if pos=0 then leave;
day=input(scan(substr(comments,pos,len),2),8.0);
output;
end;
drop start end pos len prxid;
run;
The PRX expression looks for a word boundary (\b), followed by "Day" and 1 or more whitespaces (blanks or tab characters) (\s+), 1 or more digits (\d+) and a word boundary again. The "i" at the end means to look for e.g. both "day", "DAY" and "Day".
If you want the DAY as a character variable, just drop the INPUT function (and perhaps declare the DAY variable with e.g "length day $8;" as it will otherwise get the same length as the COMMENTS variable).
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.