BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
N0o9r5a
Fluorite | Level 6

Hello everyone, I have a question about how to find/flag the second-to-last observation by the same ID. 

 

The dataset I have like this:

IDAssessment dateStatus Status_new1Status_new2
12012-06-19NoYesYes
12012-06-20YesYesYes
12012-06-21NoYesYes
22012-06-22.NoNo
22012-06-23NoNoNo
22012-06-24.NoNo
22012-06-25NoNoNo
32012-06-19UnknownYesYes
32012-06-20YesYesYes
32012-06-21NoYesYes
32012-06-22.YesYes
32012-06-23NoYesYes
42012-06-01UnknownNoUnknown
42012-06-03.NoUnknown
42012-06-05NoNoUnknown
52012-06-08NoNoUnknown

 

The observations that I would like to find/flag is where I used red color:

IDAssessment dateStatus Status_new1Status_new2count
12012-06-19NoYesYes1
12012-06-20YesYesYes2
12012-06-21NoYesYes3
22012-06-22.NoNo1
22012-06-23NoNoNo2
22012-06-24.NoNo3
22012-06-25NoNoNo4
32012-06-19UnknownYesYes1
32012-06-20YesYesYes2
32012-06-21NoYesYes3
32012-06-22.YesYes4
32012-06-23NoYesYes5
42012-06-01UnknownNoUnknown1
42012-06-03.NoUnknown2
42012-06-05NoNoUnknown3
52012-06-08NoNoUnknown1

 

I am able to create a new variable, named "count" to get a serial number by ID. But I don't know how to find the second to last observation, because the obs are not fixed. Some IDs could only have 1 ob, some IDs could 3 to 10 obs. If there is a way that can create a new variable to flag every second-to-last observation by ID in the same dataset, that would be great!

 

Thank you for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Another way to write @mkeintz 's clever logic.

data WANT (drop=_:);
  set HAVE (in=FIRSTPASS) 
      HAVE ;
  by ID;
  if FIRSTPASS then do;
    if first.ID then call missing(_N_FIRSTPASS,_N_SECONDPASS);
    _N_FIRSTPASS+1;
    delete;
  end;
  _N_SECONDPASS+1; 
  FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1);
run;

 

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

Like this?  [Updated to cater for last group]

data HAVE; 
  input ID;
cards;
1 
1 
1 
1 
2 
2 
2 
2 
3
4
4
;
data WANT;
  set HAVE nobs=NOBS;
  by ID;
  if _N_<=NOBS-2 then set HAVE(keep=ID rename=(ID=ID2) firstobs=3);
else ID2=.; if ID ne ID2 and ^last.ID then FLAG=1; run;

ID FLAG
1 .
1 .
1 1
1 .
2 .
2 .
2 1
2 .
3 .
4 1
4 .

 

N0o9r5a
Fluorite | Level 6

Hello @ChrisNZ , I just tried your code, it works perfectly for subjects who had more than 1 obs, it doesn't work for subjects who only had 1 ob. It's good enough for me! Thank you!

mkeintz
PROC Star

This is why it's a good idea to clearly describe what you want, instead of just a general statement ("second to last"), with sample data whose treatment is not in the description.  I now see for ID=5 that you want to flag singletons.  So change @ChrisNZ 's code from

 

  FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1) ;

to

  FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1) or _N_FIRSTPASS=1;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
N0o9r5a
Fluorite | Level 6

Hello @mkeintz , I just tried your code based on @ChrisNZ modified version. It worked perfectly! It also worked for subjects who only had one ob. Thank you for your help! I really appreciate it!

ChrisNZ
Tourmaline | Level 20

> it doesn't work for subjects who only had 1 ob.

It works exactly as you asked. As @mkeintz said, make sure your requirements are described properly when asking a question.

Glad you got what you need at the end.

mkeintz
PROC Star

Read all observations for each ID twice  - set have (in=firstpass) have (in=secondpass); by id;.  The first pass to get a total count of observations (_n_firstpass).  The second time to generate the COUNT variable, set a flag when COUNT=_N_FIRSTPASS-1, and output.  

 

This code is untested, in the absence of sample data presented in the form of a working data step:

 

data want (drop=_:);
  set have (in=firstpass)  have (in=secondpass);
  by id;
  if first.id then call missing(_n_firstpass,count);
  if firstpass then _n_firstpass+1;

  if secondpass;
  count+1;
  /* superseded by below: flag=ifn(count=_n_firstpass-1,1,0) ; */
  flag=ifn(count=_n_firstpass-1,1,0) or _n_firstpass=1;
run;

 

The single statement

 

  set have (in=firstpass)  have (in=secondpass);

would read all of HAVE, and then reread all of HAVE.  But combining it with a BY statement, as in:

 

  set have (in=firstpass)  have (in=secondpass);
  by id;

would read one ID group from have, and then re-read the same group.  Then it would proceed to the second group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

Another way to write @mkeintz 's clever logic.

data WANT (drop=_:);
  set HAVE (in=FIRSTPASS) 
      HAVE ;
  by ID;
  if FIRSTPASS then do;
    if first.ID then call missing(_N_FIRSTPASS,_N_SECONDPASS);
    _N_FIRSTPASS+1;
    delete;
  end;
  _N_SECONDPASS+1; 
  FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1);
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 4609 views
  • 6 likes
  • 3 in conversation