BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Moksha
Pyrite | Level 9

Hi All,

      I have the following dataset:

data have;
input id name $ score;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;

I want the following output:

 

Id  Name    Score
1 David 45 2 Sam 45 3 Mary 92 4 Dane 23 5 Jenny 87 6 Simran5 33 8 Priya 72

 

I have tried the following, but it's not working:

proc sort data=have out=have_srt;
by id;
run;

data want; set have_srt; by id; if first.id then start=_n_; retain start; if last.id then do point=max(Start, _n_ -2) to _n_; set have_srt point=point; output; end; drop start; run;

Please, help. 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Moksha,

 

Good idea to use the POINT= option. Here is a similar approach:

data want;
set have;
by id;
if last.id then do;
  if first.id then output;
  else do;
    _n_+(-1);
    set have point=_n_;
    output;
  end;
end;
run;

The same result can be achieved with this simpler, but likely less efficient code:

data want;
set have;
by id;
if last.id then do;
  _n_+(first.id-1);
  set have point=_n_;
  output;
end;
run;

View solution in original post

10 REPLIES 10
FreelanceReinh
Jade | Level 19

Hello @Moksha,

 

Good idea to use the POINT= option. Here is a similar approach:

data want;
set have;
by id;
if last.id then do;
  if first.id then output;
  else do;
    _n_+(-1);
    set have point=_n_;
    output;
  end;
end;
run;

The same result can be achieved with this simpler, but likely less efficient code:

data want;
set have;
by id;
if last.id then do;
  _n_+(first.id-1);
  set have point=_n_;
  output;
end;
run;
Moksha
Pyrite | Level 9

Thank you very much  @FreelanceReinh . It's very helpful and solved the problem.

Ksharp
Super User
data have;
input id name $ score;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;
data want;
 merge have have(firstobs=2 keep=id rename=(id=_id)) have(firstobs=3 keep=id rename=(id=__id));
 if (id=_id and id ne __id) or (id ne _id and id ne lag(id));
 drop _id __id;
run;
Moksha
Pyrite | Level 9

Thank you @Ksharp , it's very helpful. 

mkeintz
PROC Star

I like the use of merge and the firstobs= dataset name parameter proposed by @Ksharp .

 

Here's a minor tweak (firstobs=3 only) to simplify a bit further:

 

data have;
input id name $ score ;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;
data want (drop=nxt:);
  set have ;
  by id;
  merge have (keep=id)  have (firstobs=3 keep=id rename=(id=nxt2_id));

  if (first.id=1 and last.id=1) or (last.id=0 and id^=nxt2_id);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Moksha
Pyrite | Level 9

Thank you @mkeintz . It's very helpful.

 

I thank all, I got to learn here different methods.

yabwon
Amethyst | Level 16

Hey @Moksha 

 

You already have a nice set of solutions, but just for fun, two more:

 

one, that uses hash table and backward-reading of data (this one allows to replace 2nd-last with 3rd-last, 4th-last, etc. by setting the WHICH variable):

data want;
  which=2; /* which "n last" from the group */
  drop which;

  /* hash table to store the list of observations numbers */
  declare hash H(ordered:"A"); /* to keep order of IDs */
    H.defineKey("id");
    H.defineData("p", "_N_");
    H.defineDone();
  declare hiter I("H");

  /* read data backward and get obs number for "n last" from the group */
  do point=nobs to 1 by -1; 
    set have nobs=nobs point=point;

    if H.find() then _N_=1; /* counter of obs in a group (counted from the last one) */
                else _N_+1;

    if _N_ <= which then /* take the closes to the one you want, e.g., if only 2 obs in group will select 2nd, etc.*/
      do;
        p=point; /* obs to select */
        H.replace();
      end;
  end;
  
  /* loops over selected observations */
  do while(I.next()=0); 
    set have point=p;
    output;
  end;
stop;
run;

second, that uses backward reading (result is reversed):

data want;
  do point=nobs to 1 by -1;
    set have point=point nobs=nobs;
    output;
  end;
  stop;
data want;
  set want;
  by descending ID;
  if first.ID then second=0;
  second+1;
  if first.ID*last.ID OR second=2 then output;
  drop second;
run;

 

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mkeintz
PROC Star

This is a general program that provides for any offset (macrovar OFL) from the last observation:

 

%let OFL=3;   /*Offset from last 0=last, 1=last but 1, etc. */
data want (drop=_: label="Nearest Obs to &OFL before last");
  set have (in=firstpass)  have (in=secondpass);
  by id;
  if first.id then call missing (_n1, _n2);
  _n1+firstpass;    *Count firstpass obs for this ID;
  _n2+secondpass;

  if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset _N1 to target obs *;
  if secondpass=1 and _n2=_n1;
run;

You control the desired offset value with the 

    %LET OFL=value-goes-here ;

assignment of a value to macrovar OFL.

 

The DIF(secondpass) =1 condition (where dif(secondpass) is defined as secondpass-lag(secondpass) is a way to identify the start of the second pass through each ID.

 

Edit note:  Thanks to @FreelanceReinh, I modified the "if dif(secondpass)=1 ..." statement, so that in the event of an underpopulated ID, the target obs is the first (I had it as the last) obs in the ID group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
FreelanceReinh
Jade | Level 19

Another "self-interleaving" masterclass. Kudos, @mkeintz!

After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or

if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset N1 to target obs *;

for short?

mkeintz
PROC Star

@FreelanceReinh wrote:

Another "self-interleaving" masterclass. Kudos, @mkeintz!

After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or

if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset N1 to target obs *;

for short?


I think you're right.  I think I misinterpreted what the OP wants when the ID is underpopulated.  They want the first, not the last obs.

 

I'm going to change my response.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3011 views
  • 8 likes
  • 5 in conversation