Hello New SAS User community, I am working in a long data set with several hundred people where every participant (var id) was prescribed an antibiotic for 10 days with each line representing 1 day (var day). The dose (var dose) may change by day. I am trying to create a variable that captures the maximum duration of each dosing change As you can see in the table below, id #1 had 4 changes over time (600mg for 4 days, 0mg for 1 day, 300mg for 3 days, and then 600mg for 2 days) and id #2 had no changes. Using the data step, created a variable 'count_state' that enumerates the number of days at each dose over time, restarting when there is a change. Help: I would like to create a new variable (bold column below) that is the maximum of count_state for each dose before there is any change.
Thank you very much in advance!
id | day | dose | count_state | max_count_state |
1 | 1 | 600 | 1 | 4 |
1 | 2 | 600 | 2 | 4 |
1 | 3 | 600 | 3 | 4 |
1 | 4 | 600 | 4 | 4 |
1 | 5 | 0 | 1 | 1 |
1 | 6 | 300 | 1 | 3 |
1 | 7 | 300 | 2 | 3 |
1 | 8 | 300 | 3 | 3 |
1 | 9 | 600 | 1 | 2 |
1 | 10 | 600 | 2 | 2 |
2 | 1 | 600 | 1 | 10 |
2 | 2 | 600 | 2 | 10 |
2 | 3 | 600 | 3 | 10 |
2 | 4 | 600 | 4 | 10 |
2 | 5 | 600 | 5 | 10 |
2 | 6 | 600 | 6 | 10 |
2 | 7 | 600 | 7 | 10 |
2 | 8 | 600 | 8 | 10 |
2 | 9 | 600 | 9 | 10 |
2 | 10 | 600 | 10 | 10 |
First let's convert your LISTING into an actual DATASET. While we are at it let's name your last two derived variables as EXPECT_COUNT and EXPECT_MAX so we have something to use to check our programs results.
data have;
input id day dose expect_count expect_max;
cards;
1 1 600 1 4
1 2 600 2 4
1 3 600 3 4
1 4 600 4 4
1 5 0 1 1
1 6 300 1 3
1 7 300 2 3
1 8 300 3 3
1 9 600 1 2
1 10 600 2 2
2 1 600 1 10
2 2 600 2 10
2 3 600 3 10
2 4 600 4 10
2 5 600 5 10
2 6 600 6 10
2 7 600 7 10
2 8 600 8 10
2 9 600 9 10
2 10 600 10 10
;
To identify when the DOSE changes you can use BY statement with the NOTSORTED keyword.
data step1;
set have;
by id dose notsorted;
if first.id then group=0;
if first.dose then do;
group+1; count=0;
end;
count+1;
run;
Once you have this grouping variable finding the MAX of COUNT is simple. PROC SQL makes this easy because it will remerge aggregate results back onto the detailed observations.
proc sql;
create table want as
select *,max(count) as max_count
from step1
group by id,group
order by id,day
;
quit;
Result:
expect_ expect_ max_ Obs id day dose count max group count count 1 1 1 600 1 4 1 1 4 2 1 2 600 2 4 1 2 4 3 1 3 600 3 4 1 3 4 4 1 4 600 4 4 1 4 4 5 1 5 0 1 1 2 1 1 6 1 6 300 1 3 3 1 3 7 1 7 300 2 3 3 2 3 8 1 8 300 3 3 3 3 3 9 1 9 600 1 2 4 1 2 10 1 10 600 2 2 4 2 2 11 2 1 600 1 10 1 1 10 12 2 2 600 2 10 1 2 10 13 2 3 600 3 10 1 3 10 14 2 4 600 4 10 1 4 10 15 2 5 600 5 10 1 5 10 16 2 6 600 6 10 1 6 10 17 2 7 600 7 10 1 7 10 18 2 8 600 8 10 1 8 10 19 2 9 600 9 10 1 9 10 20 2 10 600 10 10 1 10 10
Note that COUNT is not really needed since the MAX of COUNT is the same thing as the number of observations in the group.
data step1;
set have(keep=id day dose);
by id dose notsorted;
if first.id then group=0;
group+first.dose;
run;
proc sql;
create table want as
select *,count(*) as max_count
from step1
group by id,group
order by id,day
;
quit;
Here's how I would do it. In a DATA step, identify each place where the dose changes (so count_state=1) and turn this into its own group, with a unique group number. Then determine the max value of count_state in each group. Merge the max in with the original data.
/* UNTESTED CODE */
data intermediate;
set have;
if count_state=1 then group+1;
run;
proc summary data=intermediate nway;
class group;
var count_state;
output out=maxx max=max_count_state;
run;
data want;
merge intermediate maxx;
by group;
run;
If you want tested code, we need to see data presented as working SAS data step code (examples and instructions). Data copied and pasted from Excel, or Excel files, are not acceptable, and we cannot test code provided via copy/paste or in Excel files.
thank you!
First let's convert your LISTING into an actual DATASET. While we are at it let's name your last two derived variables as EXPECT_COUNT and EXPECT_MAX so we have something to use to check our programs results.
data have;
input id day dose expect_count expect_max;
cards;
1 1 600 1 4
1 2 600 2 4
1 3 600 3 4
1 4 600 4 4
1 5 0 1 1
1 6 300 1 3
1 7 300 2 3
1 8 300 3 3
1 9 600 1 2
1 10 600 2 2
2 1 600 1 10
2 2 600 2 10
2 3 600 3 10
2 4 600 4 10
2 5 600 5 10
2 6 600 6 10
2 7 600 7 10
2 8 600 8 10
2 9 600 9 10
2 10 600 10 10
;
To identify when the DOSE changes you can use BY statement with the NOTSORTED keyword.
data step1;
set have;
by id dose notsorted;
if first.id then group=0;
if first.dose then do;
group+1; count=0;
end;
count+1;
run;
Once you have this grouping variable finding the MAX of COUNT is simple. PROC SQL makes this easy because it will remerge aggregate results back onto the detailed observations.
proc sql;
create table want as
select *,max(count) as max_count
from step1
group by id,group
order by id,day
;
quit;
Result:
expect_ expect_ max_ Obs id day dose count max group count count 1 1 1 600 1 4 1 1 4 2 1 2 600 2 4 1 2 4 3 1 3 600 3 4 1 3 4 4 1 4 600 4 4 1 4 4 5 1 5 0 1 1 2 1 1 6 1 6 300 1 3 3 1 3 7 1 7 300 2 3 3 2 3 8 1 8 300 3 3 3 3 3 9 1 9 600 1 2 4 1 2 10 1 10 600 2 2 4 2 2 11 2 1 600 1 10 1 1 10 12 2 2 600 2 10 1 2 10 13 2 3 600 3 10 1 3 10 14 2 4 600 4 10 1 4 10 15 2 5 600 5 10 1 5 10 16 2 6 600 6 10 1 6 10 17 2 7 600 7 10 1 7 10 18 2 8 600 8 10 1 8 10 19 2 9 600 9 10 1 9 10 20 2 10 600 10 10 1 10 10
Note that COUNT is not really needed since the MAX of COUNT is the same thing as the number of observations in the group.
data step1;
set have(keep=id day dose);
by id dose notsorted;
if first.id then group=0;
group+first.dose;
run;
proc sql;
create table want as
select *,count(*) as max_count
from step1
group by id,group
order by id,day
;
quit;
Thank you for this super clear explanation. This was very educational for me!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.