Hello fellow SAS users
I have been working on the problem below over the past few days with very little success, and would very much appreciate your help here.
Attach is a sample dataset in Excel that illustrates the data structure and what I want to achieve.
Variable 'StudentID' is self-explanatory, indicating unique individuals.
Variable 'Year' shows the yearly follow-up, in chronological order. Here, you can see that the duration of follow-up varies for each individual (between 3 and 7 years). In the real dataset I am working on, this varies from 2 to 10 years.
Variable 'RuralPrac52w' indicates the event of interest in that year (1 for Yes, 0 for No).
The final variable 'CorrEventCount' is what I wish to obtain from my coding (and I have not got there yet - which is why I am here).
The logic/rule to create this final variable is quite simple, in plain English. It is similar to the counting process for repeated events in survival analysis. It goes like this:
Within each unique StudentID, if RuralPrac52w=1 then it is 1 count for an event, whether this is for that particular year/row only, or for however many consecutive years. If there is a break in the event series (ie with a 0 in between), the counting will continue to the next event when another 1 occurs. Ultimately, Person 1 has 1 event of a continuous 2-year duration. Person 2 has 2 events, each of 1 year duration. Person 3 has 1 event of 3-year duration. Person 4 has 3 events (1st lasting 2 years, 2nd & 3rd 1 year each). Person 5 has 1 event of 5-year duration.
My two main challenges are: (i) the changeable/unfixed durations of the event within-ID and of the follow-up period between-ID; and (ii) how to continue/resume the counting once the series is interrupted (ie from 1 to 0, or 0 to 1), and keep this process going until the last record for that person.
Below is one of my recent (unsuccessful) syntax versions.
data manyevents_count;
set manyevents;
by StudentID;
retain countEvent;
do i = 1 to last.StudentID;
*set base value: 0 or 1 as approp;
countEvent=RuralPrac52w;
if RuralPrac52w=1 AND lag(lag(RuralPrac52w))=1 then countEvent=1;
if RuralPrac52w=1 AND lag(lag(RuralPrac52w))=0 then countEvent=1;
if RuralPrac52w=0 AND lag(lag(RuralPrac52w))=0 then countEvent=0;
if RuralPrac52w=0 AND lag(lag(RuralPrac52w))=1 then countEvent=1;
end;
run;
Look forward to any help or advice.
data result(drop=lag_RuralPrac52w);
set manyevents(keep=StudentID Year RuralPrac52w);
by StudentID Year;
lag_RuralPrac52w=lag(RuralPrac52w);
select;
when(first.StudentID) CorrEventCount=(RuralPrac52w=1);
when(lag_RuralPrac52w=0 and RuralPrac52w=1) CorrEventCount+1;
otherwise;
end;
run;
data result(drop=lag_RuralPrac52w);
set manyevents(keep=StudentID Year RuralPrac52w);
by StudentID Year;
lag_RuralPrac52w=lag(RuralPrac52w);
select;
when(first.StudentID) CorrEventCount=(RuralPrac52w=1);
when(lag_RuralPrac52w=0 and RuralPrac52w=1) CorrEventCount+1;
otherwise;
end;
run;
Many thanks learsaas! Have tested and it worked!!
Kind regards, H
Hello @humbleSASer and welcome to the SAS Support Communities!
Try this:
data manyevents_count;
set manyevents;
by StudentID RuralPrac52w notsorted;
if first.StudentID then CorrEventCount=0;
CorrEventCount+(first.RuralPrac52w & RuralPrac52w=1);
run;
With the above BY statement the DATA step sees the data as blocks of consecutive observations with the same StudentID which in turn consist of blocks of consecutive observations with the same RuralPrac52w value. (The NOTSORTED option is necessary because the RuralPrac52w values within a StudentID are neither increasing nor decreasing in general.) It is assumed that dataset MANYEVENTS is sorted by StudentID Year.
The IF-THEN statement initializes the count to zero at the first observation of each student. The sum statement (of the general form variable+increment;) increments CorrEventCount by 1 if the Boolean expression in the parentheses is true, i.e., if a new block of RuralPrac52w values (within a student) begins and this is a block with RuralPrac52w=1.
Many thanks FreelanceReinhard! Also tested your syntax and it worked!
Kind regards, H
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.