BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
N0o9r5a
Fluorite | Level 6

Hello everyone, I have a question about how to find/flag the second-to-last observation by the same ID. 

 

The dataset I have like this:

IDAssessment dateStatus Status_new1Status_new2
12012-06-19NoYesYes
12012-06-20YesYesYes
12012-06-21NoYesYes
22012-06-22.NoNo
22012-06-23NoNoNo
22012-06-24.NoNo
22012-06-25NoNoNo
32012-06-19UnknownYesYes
32012-06-20YesYesYes
32012-06-21NoYesYes
32012-06-22.YesYes
32012-06-23NoYesYes
42012-06-01UnknownNoUnknown
42012-06-03.NoUnknown
42012-06-05NoNoUnknown
52012-06-08NoNoUnknown

 

The observations that I would like to find/flag is where I used red color:

IDAssessment dateStatus Status_new1Status_new2count
12012-06-19NoYesYes1
12012-06-20YesYesYes2
12012-06-21NoYesYes3
22012-06-22.NoNo1
22012-06-23NoNoNo2
22012-06-24.NoNo3
22012-06-25NoNoNo4
32012-06-19UnknownYesYes1
32012-06-20YesYesYes2
32012-06-21NoYesYes3
32012-06-22.YesYes4
32012-06-23NoYesYes5
42012-06-01UnknownNoUnknown1
42012-06-03.NoUnknown2
42012-06-05NoNoUnknown3
52012-06-08NoNoUnknown1

 

I am able to create a new variable, named "count" to get a serial number by ID. But I don't know how to find the second to last observation, because the obs are not fixed. Some IDs could only have 1 ob, some IDs could 3 to 10 obs. If there is a way that can create a new variable to flag every second-to-last observation by ID in the same dataset, that would be great!

 

Thank you for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Another way to write @mkeintz 's clever logic.

data WANT (drop=_:);
  set HAVE (in=FIRSTPASS) 
      HAVE ;
  by ID;
  if FIRSTPASS then do;
    if first.ID then call missing(_N_FIRSTPASS,_N_SECONDPASS);
    _N_FIRSTPASS+1;
    delete;
  end;
  _N_SECONDPASS+1; 
  FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1);
run;

 

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

Like this?  [Updated to cater for last group]

data HAVE; 
  input ID;
cards;
1 
1 
1 
1 
2 
2 
2 
2 
3
4
4
;
data WANT;
  set HAVE nobs=NOBS;
  by ID;
  if _N_<=NOBS-2 then set HAVE(keep=ID rename=(ID=ID2) firstobs=3);
else ID2=.; if ID ne ID2 and ^last.ID then FLAG=1; run;

ID FLAG
1 .
1 .
1 1
1 .
2 .
2 .
2 1
2 .
3 .
4 1
4 .

 

N0o9r5a
Fluorite | Level 6

Hello @ChrisNZ , I just tried your code, it works perfectly for subjects who had more than 1 obs, it doesn't work for subjects who only had 1 ob. It's good enough for me! Thank you!

mkeintz
PROC Star

This is why it's a good idea to clearly describe what you want, instead of just a general statement ("second to last"), with sample data whose treatment is not in the description.  I now see for ID=5 that you want to flag singletons.  So change @ChrisNZ 's code from

 

  FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1) ;

to

  FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1) or _N_FIRSTPASS=1;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
N0o9r5a
Fluorite | Level 6

Hello @mkeintz , I just tried your code based on @ChrisNZ modified version. It worked perfectly! It also worked for subjects who only had one ob. Thank you for your help! I really appreciate it!

ChrisNZ
Tourmaline | Level 20

> it doesn't work for subjects who only had 1 ob.

It works exactly as you asked. As @mkeintz said, make sure your requirements are described properly when asking a question.

Glad you got what you need at the end.

mkeintz
PROC Star

Read all observations for each ID twice  - set have (in=firstpass) have (in=secondpass); by id;.  The first pass to get a total count of observations (_n_firstpass).  The second time to generate the COUNT variable, set a flag when COUNT=_N_FIRSTPASS-1, and output.  

 

This code is untested, in the absence of sample data presented in the form of a working data step:

 

data want (drop=_:);
  set have (in=firstpass)  have (in=secondpass);
  by id;
  if first.id then call missing(_n_firstpass,count);
  if firstpass then _n_firstpass+1;

  if secondpass;
  count+1;
  /* superseded by below: flag=ifn(count=_n_firstpass-1,1,0) ; */
  flag=ifn(count=_n_firstpass-1,1,0) or _n_firstpass=1;
run;

 

The single statement

 

  set have (in=firstpass)  have (in=secondpass);

would read all of HAVE, and then reread all of HAVE.  But combining it with a BY statement, as in:

 

  set have (in=firstpass)  have (in=secondpass);
  by id;

would read one ID group from have, and then re-read the same group.  Then it would proceed to the second group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ChrisNZ
Tourmaline | Level 20

Another way to write @mkeintz 's clever logic.

data WANT (drop=_:);
  set HAVE (in=FIRSTPASS) 
      HAVE ;
  by ID;
  if FIRSTPASS then do;
    if first.ID then call missing(_N_FIRSTPASS,_N_SECONDPASS);
    _N_FIRSTPASS+1;
    delete;
  end;
  _N_SECONDPASS+1; 
  FLAG=(_N_SECONDPASS=_N_FIRSTPASS-1);
run;

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 4795 views
  • 6 likes
  • 3 in conversation