Solved: Long data set: Add column that is maximum duration of dose for periods...

sofia_de_garay · Posted 07-01-2025 02:19 PM

Hello New SAS User community, I am working in a long data set with several hundred people where every participant (var id) was prescribed an antibiotic for 10 days with each line representing 1 day (var day). The dose (var dose) may change by day. I am trying to create a variable that captures the maximum duration of each dosing change As you can see in the table below, id #1 had 4 changes over time (600mg for 4 days, 0mg for 1 day, 300mg for 3 days, and then 600mg for 2 days) and id #2 had no changes. Using the data step, created a variable 'count_state' that enumerates the number of days at each dose over time, restarting when there is a change. Help: I would like to create a new variable (bold column below) that is the maximum of count_state for each dose before there is any change.

Thank you very much in advance!

id	day	dose	count_state	max_count_state
1	1	600	1	4
1	2	600	2	4
1	3	600	3	4
1	4	600	4	4
1	5	0	1	1
1	6	300	1	3
1	7	300	2	3
1	8	300	3	3
1	9	600	1	2
1	10	600	2	2
2	1	600	1	10
2	2	600	2	10
2	3	600	3	10
2	4	600	4	10
2	5	600	5	10
2	6	600	6	10
2	7	600	7	10
2	8	600	8	10
2	9	600	9	10
2	10	600	10	10

Tom · Posted 07-01-2025 04:17 PM

First let's convert your LISTING into an actual DATASET. While we are at it let's name your last two derived variables as EXPECT_COUNT and EXPECT_MAX so we have something to use to check our programs results.

Spoiler

data have;
  input id day dose expect_count expect_max;
cards;
1 1 600 1 4
1 2 600 2 4
1 3 600 3 4
1 4 600 4 4
1 5 0 1 1
1 6 300 1 3
1 7 300 2 3
1 8 300 3 3
1 9 600 1 2
1 10 600 2 2
2 1 600 1 10
2 2 600 2 10
2 3 600 3 10
2 4 600 4 10
2 5 600 5 10
2 6 600 6 10
2 7 600 7 10
2 8 600 8 10
2 9 600 9 10
2 10 600 10 10
;

data have; input id day dose expect_count expect_max; cards; 1 1 600 1 4 1 2 600 2 4 1 3 600 3 4 1 4 600 4 4 1 5 0 1 1 1 6 300 1 3 1 7 300 2 3 1 8 300 3 3 1 9 600 1 2 1 10 600 2 2 2 1 600 1 10 2 2 600 2 10 2 3 600 3 10 2 4 600 4 10 2 5 600 5 10 2 6 600 6 10 2 7 600 7 10 2 8 600 8 10 2 9 600 9 10 2 10 600 10 10 ;

To identify when the DOSE changes you can use BY statement with the NOTSORTED keyword.

data step1;
  set have;
  by id dose notsorted;
  if first.id then group=0;
  if first.dose then do;
    group+1; count=0;
  end;
  count+1;
run;

Once you have this grouping variable finding the MAX of COUNT is simple. PROC SQL makes this easy because it will remerge aggregate results back onto the detailed observations.

proc sql;
create table want as
select *,max(count) as max_count
from step1
group by id,group
order by id,day
;
quit;

Result:

                            expect_    expect_                       max_
Obs    id    day    dose     count       max      group    count    count

  1     1      1     600        1          4        1         1        4
  2     1      2     600        2          4        1         2        4
  3     1      3     600        3          4        1         3        4
  4     1      4     600        4          4        1         4        4
  5     1      5       0        1          1        2         1        1
  6     1      6     300        1          3        3         1        3
  7     1      7     300        2          3        3         2        3
  8     1      8     300        3          3        3         3        3
  9     1      9     600        1          2        4         1        2
 10     1     10     600        2          2        4         2        2
 11     2      1     600        1         10        1         1       10
 12     2      2     600        2         10        1         2       10
 13     2      3     600        3         10        1         3       10
 14     2      4     600        4         10        1         4       10
 15     2      5     600        5         10        1         5       10
 16     2      6     600        6         10        1         6       10
 17     2      7     600        7         10        1         7       10
 18     2      8     600        8         10        1         8       10
 19     2      9     600        9         10        1         9       10
 20     2     10     600       10         10        1        10       10

Note that COUNT is not really needed since the MAX of COUNT is the same thing as the number of observations in the group.

data step1;
  set have(keep=id day dose);
  by id dose notsorted;
  if first.id then group=0;
  group+first.dose;
run;

proc sql;
create table want as
select *,count(*) as max_count
from step1
group by id,group
order by id,day
;
quit;

View solution in original post

PaigeMiller · Posted 07-01-2025 02:40 PM

Here's how I would do it. In a DATA step, identify each place where the dose changes (so count_state=1) and turn this into its own group, with a unique group number. Then determine the max value of count_state in each group. Merge the max in with the original data.

/* UNTESTED CODE */
data intermediate;
    set have;
    if count_state=1 then group+1;
run;
proc summary data=intermediate nway;
    class group;
    var count_state;
    output out=maxx max=max_count_state;
run;
data want;
   merge intermediate maxx;
   by group;
run;

If you want tested code, we need to see data presented as working SAS data step code (examples and instructions). Data copied and pasted from Excel, or Excel files, are not acceptable, and we cannot test code provided via copy/paste or in Excel files.

--
Paige Miller

sofia_de_garay · Posted 07-02-2025 08:12 AM

thank you!

Tom · Posted 07-01-2025 04:17 PM

First let's convert your LISTING into an actual DATASET. While we are at it let's name your last two derived variables as EXPECT_COUNT and EXPECT_MAX so we have something to use to check our programs results.

Spoiler

data have;
  input id day dose expect_count expect_max;
cards;
1 1 600 1 4
1 2 600 2 4
1 3 600 3 4
1 4 600 4 4
1 5 0 1 1
1 6 300 1 3
1 7 300 2 3
1 8 300 3 3
1 9 600 1 2
1 10 600 2 2
2 1 600 1 10
2 2 600 2 10
2 3 600 3 10
2 4 600 4 10
2 5 600 5 10
2 6 600 6 10
2 7 600 7 10
2 8 600 8 10
2 9 600 9 10
2 10 600 10 10
;

data have; input id day dose expect_count expect_max; cards; 1 1 600 1 4 1 2 600 2 4 1 3 600 3 4 1 4 600 4 4 1 5 0 1 1 1 6 300 1 3 1 7 300 2 3 1 8 300 3 3 1 9 600 1 2 1 10 600 2 2 2 1 600 1 10 2 2 600 2 10 2 3 600 3 10 2 4 600 4 10 2 5 600 5 10 2 6 600 6 10 2 7 600 7 10 2 8 600 8 10 2 9 600 9 10 2 10 600 10 10 ;

To identify when the DOSE changes you can use BY statement with the NOTSORTED keyword.

data step1;
  set have;
  by id dose notsorted;
  if first.id then group=0;
  if first.dose then do;
    group+1; count=0;
  end;
  count+1;
run;

Once you have this grouping variable finding the MAX of COUNT is simple. PROC SQL makes this easy because it will remerge aggregate results back onto the detailed observations.

proc sql;
create table want as
select *,max(count) as max_count
from step1
group by id,group
order by id,day
;
quit;

Result:

                            expect_    expect_                       max_
Obs    id    day    dose     count       max      group    count    count

  1     1      1     600        1          4        1         1        4
  2     1      2     600        2          4        1         2        4
  3     1      3     600        3          4        1         3        4
  4     1      4     600        4          4        1         4        4
  5     1      5       0        1          1        2         1        1
  6     1      6     300        1          3        3         1        3
  7     1      7     300        2          3        3         2        3
  8     1      8     300        3          3        3         3        3
  9     1      9     600        1          2        4         1        2
 10     1     10     600        2          2        4         2        2
 11     2      1     600        1         10        1         1       10
 12     2      2     600        2         10        1         2       10
 13     2      3     600        3         10        1         3       10
 14     2      4     600        4         10        1         4       10
 15     2      5     600        5         10        1         5       10
 16     2      6     600        6         10        1         6       10
 17     2      7     600        7         10        1         7       10
 18     2      8     600        8         10        1         8       10
 19     2      9     600        9         10        1         9       10
 20     2     10     600       10         10        1        10       10

Note that COUNT is not really needed since the MAX of COUNT is the same thing as the number of observations in the group.

data step1;
  set have(keep=id day dose);
  by id dose notsorted;
  if first.id then group=0;
  group+first.dose;
run;

proc sql;
create table want as
select *,count(*) as max_count
from step1
group by id,group
order by id,day
;
quit;

sofia_de_garay · Posted 07-02-2025 08:12 AM

Thank you for this super clear explanation. This was very educational for me!

Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Re: Long data set: Add column that is maximum duration of dose for periods over time

Registration is open