Solved: Re: second last observation in a dataset

Moksha · Posted 03-23-2024 05:44 AM

Hi All,

I have the following dataset:

data have;
input id name $ score;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;

I want the following output:

Id  Name    Score
1  David    45
2  Sam      45
3  Mary     92
4  Dane     23
5  Jenny    87
6  Simran5  33
8  Priya    72

I have tried the following, but it's not working:

proc sort data=have out=have_srt;
by id;
run;

data want;
	set have_srt;
	by id;

	if first.id then start=_n_;
	retain start;

	if last.id then do point=max(Start, _n_ -2) to _n_;
		set have_srt point=point;
		output;
	end;
	drop start;
run;

Please, help.

FreelanceReinh · Posted 03-23-2024 06:06 AM

Hello @Moksha,

Good idea to use the POINT= option. Here is a similar approach:

data want;
set have;
by id;
if last.id then do;
  if first.id then output;
  else do;
    _n_+(-1);
    set have point=_n_;
    output;
  end;
end;
run;

The same result can be achieved with this simpler, but likely less efficient code:

data want;
set have;
by id;
if last.id then do;
  _n_+(first.id-1);
  set have point=_n_;
  output;
end;
run;

View solution in original post

FreelanceReinh · Posted 03-23-2024 06:06 AM

Hello @Moksha,

Good idea to use the POINT= option. Here is a similar approach:

data want;
set have;
by id;
if last.id then do;
  if first.id then output;
  else do;
    _n_+(-1);
    set have point=_n_;
    output;
  end;
end;
run;

The same result can be achieved with this simpler, but likely less efficient code:

data want;
set have;
by id;
if last.id then do;
  _n_+(first.id-1);
  set have point=_n_;
  output;
end;
run;

Moksha · Posted 03-23-2024 06:12 AM

Thank you very much @FreelanceReinh . It's very helpful and solved the problem.

Ksharp · Posted 03-23-2024 06:18 AM

data have;
input id name $ score;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;
data want;
 merge have have(firstobs=2 keep=id rename=(id=_id)) have(firstobs=3 keep=id rename=(id=__id));
 if (id=_id and id ne __id) or (id ne _id and id ne lag(id));
 drop _id __id;
run;

Moksha · Posted 03-23-2024 08:07 AM

Thank you @Ksharp , it's very helpful.

mkeintz · Posted 03-23-2024 07:40 AM

I like the use of merge and the firstobs= dataset name parameter proposed by @Ksharp .

Here's a minor tweak (firstobs=3 only) to simplify a bit further:

data have;
input id name $ score ;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;
data want (drop=nxt:);
  set have ;
  by id;
  merge have (keep=id)  have (firstobs=3 keep=id rename=(id=nxt2_id));

  if (first.id=1 and last.id=1) or (last.id=0 and id^=nxt2_id);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Moksha · Posted 03-23-2024 08:08 AM

Thank you @mkeintz . It's very helpful.

I thank all, I got to learn here different methods.

yabwon · Posted 03-23-2024 01:07 PM

Hey @Moksha

You already have a nice set of solutions, but just for fun, two more:

one, that uses hash table and backward-reading of data (this one allows to replace 2nd-last with 3rd-last, 4th-last, etc. by setting the WHICH variable):

data want;
  which=2; /* which "n last" from the group */
  drop which;

  /* hash table to store the list of observations numbers */
  declare hash H(ordered:"A"); /* to keep order of IDs */
    H.defineKey("id");
    H.defineData("p", "_N_");
    H.defineDone();
  declare hiter I("H");

  /* read data backward and get obs number for "n last" from the group */
  do point=nobs to 1 by -1; 
    set have nobs=nobs point=point;

    if H.find() then _N_=1; /* counter of obs in a group (counted from the last one) */
                else _N_+1;

    if _N_ <= which then /* take the closes to the one you want, e.g., if only 2 obs in group will select 2nd, etc.*/
      do;
        p=point; /* obs to select */
        H.replace();
      end;
  end;
  
  /* loops over selected observations */
  do while(I.next()=0); 
    set have point=p;
    output;
  end;
stop;
run;

second, that uses backward reading (result is reversed):

data want;
  do point=nobs to 1 by -1;
    set have point=point nobs=nobs;
    output;
  end;
  stop;
data want;
  set want;
  by descending ID;
  if first.ID then second=0;
  second+1;
  if first.ID*last.ID OR second=2 then output;
  drop second;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

mkeintz · Posted 03-23-2024 05:40 PM

This is a general program that provides for any offset (macrovar OFL) from the last observation:

%let OFL=3;   /*Offset from last 0=last, 1=last but 1, etc. */
data want (drop=_: label="Nearest Obs to &OFL before last");
  set have (in=firstpass)  have (in=secondpass);
  by id;
  if first.id then call missing (_n1, _n2);
  _n1+firstpass;    *Count firstpass obs for this ID;
  _n2+secondpass;

  if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset _N1 to target obs *;
  if secondpass=1 and _n2=_n1;
run;

You control the desired offset value with the

%LET OFL=value-goes-here ;

assignment of a value to macrovar OFL.

The DIF(secondpass) =1 condition (where dif(secondpass) is defined as secondpass-lag(secondpass) is a way to identify the start of the second pass through each ID.

Edit note: Thanks to @FreelanceReinh, I modified the "if dif(secondpass)=1 ..." statement, so that in the event of an underpopulated ID, the target obs is the first (I had it as the last) obs in the ID group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

FreelanceReinh · Posted 03-23-2024 07:23 PM

Another "self-interleaving" masterclass. Kudos, @mkeintz!

After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or

if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset N1 to target obs *;

for short?

mkeintz · Posted 03-23-2024 09:55 PM

@FreelanceReinh wrote:

Another "self-interleaving" masterclass. Kudos, @mkeintz!

After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or
if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset N1 to target obs *;
for short?

I think you're right. I think I misinterpreted what the OP wants when the ID is underpopulated. They want the first, not the last obs.

I'm going to change my response.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Training: Just a Click Away