SAS Programming

Moksha · Posted 03-23-2024 05:44 AM

Hi All,

I have the following dataset:

data have;
input id name $ score;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;

I want the following output:

Id  Name    Score
1  David    45
2  Sam      45
3  Mary     92
4  Dane     23
5  Jenny    87
6  Simran5  33
8  Priya    72

I have tried the following, but it's not working:

proc sort data=have out=have_srt;
by id;
run;

data want;
	set have_srt;
	by id;

	if first.id then start=_n_;
	retain start;

	if last.id then do point=max(Start, _n_ -2) to _n_;
		set have_srt point=point;
		output;
	end;
	drop start;
run;

Please, help.

FreelanceReinh · Posted 03-23-2024 06:06 AM

Hello @Moksha,

Good idea to use the POINT= option. Here is a similar approach:

data want;
set have;
by id;
if last.id then do;
  if first.id then output;
  else do;
    _n_+(-1);
    set have point=_n_;
    output;
  end;
end;
run;

The same result can be achieved with this simpler, but likely less efficient code:

data want;
set have;
by id;
if last.id then do;
  _n_+(first.id-1);
  set have point=_n_;
  output;
end;
run;

View solution in original post

FreelanceReinh · Posted 03-23-2024 06:06 AM

Hello @Moksha,

Good idea to use the POINT= option. Here is a similar approach:

data want;
set have;
by id;
if last.id then do;
  if first.id then output;
  else do;
    _n_+(-1);
    set have point=_n_;
    output;
  end;
end;
run;

The same result can be achieved with this simpler, but likely less efficient code:

data want;
set have;
by id;
if last.id then do;
  _n_+(first.id-1);
  set have point=_n_;
  output;
end;
run;

Moksha · Posted 03-23-2024 06:12 AM

Thank you very much @FreelanceReinh . It's very helpful and solved the problem.

Ksharp · Posted 03-23-2024 06:18 AM

data have;
input id name $ score;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;
data want;
 merge have have(firstobs=2 keep=id rename=(id=_id)) have(firstobs=3 keep=id rename=(id=__id));
 if (id=_id and id ne __id) or (id ne _id and id ne lag(id));
 drop _id __id;
run;

Moksha · Posted 03-23-2024 08:07 AM

Thank you @Ksharp , it's very helpful.

mkeintz · Posted 03-23-2024 07:40 AM

I like the use of merge and the firstobs= dataset name parameter proposed by @Ksharp .

Here's a minor tweak (firstobs=3 only) to simplify a bit further:

data have;
input id name $ score ;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;
data want (drop=nxt:);
  set have ;
  by id;
  merge have (keep=id)  have (firstobs=3 keep=id rename=(id=nxt2_id));

  if (first.id=1 and last.id=1) or (last.id=0 and id^=nxt2_id);
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Moksha · Posted 03-23-2024 08:08 AM

Thank you @mkeintz . It's very helpful.

I thank all, I got to learn here different methods.

yabwon · Posted 03-23-2024 01:07 PM

Hey @Moksha

You already have a nice set of solutions, but just for fun, two more:

one, that uses hash table and backward-reading of data (this one allows to replace 2nd-last with 3rd-last, 4th-last, etc. by setting the WHICH variable):

data want;
  which=2; /* which "n last" from the group */
  drop which;

  /* hash table to store the list of observations numbers */
  declare hash H(ordered:"A"); /* to keep order of IDs */
    H.defineKey("id");
    H.defineData("p", "_N_");
    H.defineDone();
  declare hiter I("H");

  /* read data backward and get obs number for "n last" from the group */
  do point=nobs to 1 by -1; 
    set have nobs=nobs point=point;

    if H.find() then _N_=1; /* counter of obs in a group (counted from the last one) */
                else _N_+1;

    if _N_ <= which then /* take the closes to the one you want, e.g., if only 2 obs in group will select 2nd, etc.*/
      do;
        p=point; /* obs to select */
        H.replace();
      end;
  end;
  
  /* loops over selected observations */
  do while(I.next()=0); 
    set have point=p;
    output;
  end;
stop;
run;

second, that uses backward reading (result is reversed):

data want;
  do point=nobs to 1 by -1;
    set have point=point nobs=nobs;
    output;
  end;
  stop;
data want;
  set want;
  by descending ID;
  if first.ID then second=0;
  second+1;
  if first.ID*last.ID OR second=2 then output;
  drop second;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

mkeintz · Posted 03-23-2024 05:40 PM

This is a general program that provides for any offset (macrovar OFL) from the last observation:

%let OFL=3;   /*Offset from last 0=last, 1=last but 1, etc. */
data want (drop=_: label="Nearest Obs to &OFL before last");
  set have (in=firstpass)  have (in=secondpass);
  by id;
  if first.id then call missing (_n1, _n2);
  _n1+firstpass;    *Count firstpass obs for this ID;
  _n2+secondpass;

  if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset _N1 to target obs *;
  if secondpass=1 and _n2=_n1;
run;

You control the desired offset value with the

%LET OFL=value-goes-here ;

assignment of a value to macrovar OFL.

The DIF(secondpass) =1 condition (where dif(secondpass) is defined as secondpass-lag(secondpass) is a way to identify the start of the second pass through each ID.

Edit note: Thanks to @FreelanceReinh, I modified the "if dif(secondpass)=1 ..." statement, so that in the event of an underpopulated ID, the target obs is the first (I had it as the last) obs in the ID group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

FreelanceReinh · Posted 03-23-2024 07:23 PM

Another "self-interleaving" masterclass. Kudos, @mkeintz!

After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or

if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset N1 to target obs *;

for short?

mkeintz · Posted 03-23-2024 09:55 PM

@FreelanceReinh wrote:

Another "self-interleaving" masterclass. Kudos, @mkeintz!

After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or
if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset N1 to target obs *;
for short?

I think you're right. I think I misinterpreted what the OP wants when the ID is underpopulated. They want the first, not the last obs.

I'm going to change my response.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Re: second last observation in a dataset

Output Dataset with Number of Observations Used from Proc Logistic

output observations in one dataset but not in another

Dynamic way of fetching Last 50 observations from dataset

Count Unique Observations

Dataset with double observations

Follow Us

What is...

SAS Programming

Join us for our biggest event of the year!

SAS Training: Just a Click Away

Follow Us

What is...