Hello everyone, I have a question about how to find/flag the second-to-last observation by the same ID.
The dataset I have like this:
ID | Assessment date | Status | Status_new1 | Status_new2 |
1 | 2012-06-19 | No | Yes | Yes |
1 | 2012-06-20 | Yes | Yes | Yes |
1 | 2012-06-21 | No | Yes | Yes |
2 | 2012-06-22 | . | No | No |
2 | 2012-06-23 | No | No | No |
2 | 2012-06-24 | . | No | No |
2 | 2012-06-25 | No | No | No |
3 | 2012-06-19 | Unknown | Yes | Yes |
3 | 2012-06-20 | Yes | Yes | Yes |
3 | 2012-06-21 | No | Yes | Yes |
3 | 2012-06-22 | . | Yes | Yes |
3 | 2012-06-23 | No | Yes | Yes |
4 | 2012-06-01 | Unknown | No | Unknown |
4 | 2012-06-03 | . | No | Unknown |
4 | 2012-06-05 | No | No | Unknown |
5 | 2012-06-08 | No | No | Unknown |
The observations that I would like to find/flag is where I used red color:
ID | Assessment date | Status | Status_new1 | Status_new2 | count |
1 | 2012-06-19 | No | Yes | Yes | 1 |
1 | 2012-06-20 | Yes | Yes | Yes | 2 |
1 | 2012-06-21 | No | Yes | Yes | 3 |
2 | 2012-06-22 | . | No | No | 1 |
2 | 2012-06-23 | No | No | No | 2 |
2 | 2012-06-24 | . | No | No | 3 |
2 | 2012-06-25 | No | No | No | 4 |
3 | 2012-06-19 | Unknown | Yes | Yes | 1 |
3 | 2012-06-20 | Yes | Yes | Yes | 2 |
3 | 2012-06-21 | No | Yes | Yes | 3 |
3 | 2012-06-22 | . | Yes | Yes | 4 |
3 | 2012-06-23 | No | Yes | Yes | 5 |
4 | 2012-06-01 | Unknown | No | Unknown | 1 |
4 | 2012-06-03 | . | No | Unknown | 2 |
4 | 2012-06-05 | No | No | Unknown | 3 |
5 | 2012-06-08 | No | No | Unknown | 1 |
I am able to create a new variable, named "count" to get a serial number by ID. But I don't know how to find the second to last observation, because the obs are not fixed. Some IDs could only have 1 ob, some IDs could 3 to 10 obs. If there is a way that can create a new variable to flag every second-to-last observation by ID in the same dataset, that would be great!
Thank you for your help!
Another way to write @mkeintz 's clever logic.
data WANT (drop=_:);
set HAVE (in=FIRSTPASS)
HAVE ;
by ID;
if FIRSTPASS then do;
if first.ID then call missing(_N_FIRSTPASS,_N_SECONDPASS);
_N_FIRSTPASS+1;
delete;
end;
_N_SECONDPASS+1;
FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1);
run;
Like this? [Updated to cater for last group]
data HAVE;
input ID;
cards;
1
1
1
1
2
2
2
2
3
4
4
;
data WANT;
set HAVE nobs=NOBS;
by ID;
if _N_<=NOBS-2 then set HAVE(keep=ID rename=(ID=ID2) firstobs=3);
else ID2=.;
if ID ne ID2 and ^last.ID then FLAG=1;
run;
ID | FLAG |
---|---|
1 | . |
1 | . |
1 | 1 |
1 | . |
2 | . |
2 | . |
2 | 1 |
2 | . |
3 | . |
4 | 1 |
4 | . |
Hello @ChrisNZ , I just tried your code, it works perfectly for subjects who had more than 1 obs, it doesn't work for subjects who only had 1 ob. It's good enough for me! Thank you!
This is why it's a good idea to clearly describe what you want, instead of just a general statement ("second to last"), with sample data whose treatment is not in the description. I now see for ID=5 that you want to flag singletons. So change @ChrisNZ 's code from
FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1) ;
to
FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1) or _N_FIRSTPASS=1;
> it doesn't work for subjects who only had 1 ob.
It works exactly as you asked. As @mkeintz said, make sure your requirements are described properly when asking a question.
Glad you got what you need at the end.
Read all observations for each ID twice - set have (in=firstpass) have (in=secondpass); by id;. The first pass to get a total count of observations (_n_firstpass). The second time to generate the COUNT variable, set a flag when COUNT=_N_FIRSTPASS-1, and output.
This code is untested, in the absence of sample data presented in the form of a working data step:
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by id;
if first.id then call missing(_n_firstpass,count);
if firstpass then _n_firstpass+1;
if secondpass;
count+1;
/* superseded by below: flag=ifn(count=_n_firstpass-1,1,0) ; */
flag=ifn(count=_n_firstpass-1,1,0) or _n_firstpass=1;
run;
The single statement
set have (in=firstpass) have (in=secondpass);
would read all of HAVE, and then reread all of HAVE. But combining it with a BY statement, as in:
set have (in=firstpass) have (in=secondpass);
by id;
would read one ID group from have, and then re-read the same group. Then it would proceed to the second group.
Another way to write @mkeintz 's clever logic.
data WANT (drop=_:);
set HAVE (in=FIRSTPASS)
HAVE ;
by ID;
if FIRSTPASS then do;
if first.ID then call missing(_N_FIRSTPASS,_N_SECONDPASS);
_N_FIRSTPASS+1;
delete;
end;
_N_SECONDPASS+1;
FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.