BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Moksha
Pyrite | Level 9

Hi All,

      I have the following dataset:

data have;
input id name $ score;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;

I want the following output:

 

Id  Name    Score
1 David 45 2 Sam 45 3 Mary 92 4 Dane 23 5 Jenny 87 6 Simran5 33 8 Priya 72

 

I have tried the following, but it's not working:

proc sort data=have out=have_srt;
by id;
run;

data want; set have_srt; by id; if first.id then start=_n_; retain start; if last.id then do point=max(Start, _n_ -2) to _n_; set have_srt point=point; output; end; drop start; run;

Please, help. 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Moksha,

 

Good idea to use the POINT= option. Here is a similar approach:

data want;
set have;
by id;
if last.id then do;
  if first.id then output;
  else do;
    _n_+(-1);
    set have point=_n_;
    output;
  end;
end;
run;

The same result can be achieved with this simpler, but likely less efficient code:

data want;
set have;
by id;
if last.id then do;
  _n_+(first.id-1);
  set have point=_n_;
  output;
end;
run;

View solution in original post

10 REPLIES 10
FreelanceReinh
Jade | Level 19

Hello @Moksha,

 

Good idea to use the POINT= option. Here is a similar approach:

data want;
set have;
by id;
if last.id then do;
  if first.id then output;
  else do;
    _n_+(-1);
    set have point=_n_;
    output;
  end;
end;
run;

The same result can be achieved with this simpler, but likely less efficient code:

data want;
set have;
by id;
if last.id then do;
  _n_+(first.id-1);
  set have point=_n_;
  output;
end;
run;
Moksha
Pyrite | Level 9

Thank you very much  @FreelanceReinh . It's very helpful and solved the problem.

Ksharp
Super User
data have;
input id name $ score;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;
data want;
 merge have have(firstobs=2 keep=id rename=(id=_id)) have(firstobs=3 keep=id rename=(id=__id));
 if (id=_id and id ne __id) or (id ne _id and id ne lag(id));
 drop _id __id;
run;
Moksha
Pyrite | Level 9

Thank you @Ksharp , it's very helpful. 

mkeintz
PROC Star

I like the use of merge and the firstobs= dataset name parameter proposed by @Ksharp .

 

Here's a minor tweak (firstobs=3 only) to simplify a bit further:

 

data have;
input id name $ score ;
datalines;
1  David    45
1  David    74
2  Sam      45
2  Ram      54
3  Bane     87
3  Mary     92
3  Bane     87
4  Dane     23
5  Jenny    87
5  Ken      87
6  Simran1  63
6  Simran2  65
6  Simran3  53
6  Simran4  73
6  Simran5  33
6  Simran6  45
8  Priya    72
;
run;
data want (drop=nxt:);
  set have ;
  by id;
  merge have (keep=id)  have (firstobs=3 keep=id rename=(id=nxt2_id));

  if (first.id=1 and last.id=1) or (last.id=0 and id^=nxt2_id);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Moksha
Pyrite | Level 9

Thank you @mkeintz . It's very helpful.

 

I thank all, I got to learn here different methods.

yabwon
Onyx | Level 15

Hey @Moksha 

 

You already have a nice set of solutions, but just for fun, two more:

 

one, that uses hash table and backward-reading of data (this one allows to replace 2nd-last with 3rd-last, 4th-last, etc. by setting the WHICH variable):

data want;
  which=2; /* which "n last" from the group */
  drop which;

  /* hash table to store the list of observations numbers */
  declare hash H(ordered:"A"); /* to keep order of IDs */
    H.defineKey("id");
    H.defineData("p", "_N_");
    H.defineDone();
  declare hiter I("H");

  /* read data backward and get obs number for "n last" from the group */
  do point=nobs to 1 by -1; 
    set have nobs=nobs point=point;

    if H.find() then _N_=1; /* counter of obs in a group (counted from the last one) */
                else _N_+1;

    if _N_ <= which then /* take the closes to the one you want, e.g., if only 2 obs in group will select 2nd, etc.*/
      do;
        p=point; /* obs to select */
        H.replace();
      end;
  end;
  
  /* loops over selected observations */
  do while(I.next()=0); 
    set have point=p;
    output;
  end;
stop;
run;

second, that uses backward reading (result is reversed):

data want;
  do point=nobs to 1 by -1;
    set have point=point nobs=nobs;
    output;
  end;
  stop;
data want;
  set want;
  by descending ID;
  if first.ID then second=0;
  second+1;
  if first.ID*last.ID OR second=2 then output;
  drop second;
run;

 

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mkeintz
PROC Star

This is a general program that provides for any offset (macrovar OFL) from the last observation:

 

%let OFL=3;   /*Offset from last 0=last, 1=last but 1, etc. */
data want (drop=_: label="Nearest Obs to &OFL before last");
  set have (in=firstpass)  have (in=secondpass);
  by id;
  if first.id then call missing (_n1, _n2);
  _n1+firstpass;    *Count firstpass obs for this ID;
  _n2+secondpass;

  if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset _N1 to target obs *;
  if secondpass=1 and _n2=_n1;
run;

You control the desired offset value with the 

    %LET OFL=value-goes-here ;

assignment of a value to macrovar OFL.

 

The DIF(secondpass) =1 condition (where dif(secondpass) is defined as secondpass-lag(secondpass) is a way to identify the start of the second pass through each ID.

 

Edit note:  Thanks to @FreelanceReinh, I modified the "if dif(secondpass)=1 ..." statement, so that in the event of an underpopulated ID, the target obs is the first (I had it as the last) obs in the ID group.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
FreelanceReinh
Jade | Level 19

Another "self-interleaving" masterclass. Kudos, @mkeintz!

After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or

if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset N1 to target obs *;

for short?

mkeintz
PROC Star

@FreelanceReinh wrote:

Another "self-interleaving" masterclass. Kudos, @mkeintz!

After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or

if dif(secondpass)=1 then _n1=max(1,_n1-&ofl);  *Reset N1 to target obs *;

for short?


I think you're right.  I think I misinterpreted what the OP wants when the ID is underpopulated.  They want the first, not the last obs.

 

I'm going to change my response.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1507 views
  • 8 likes
  • 5 in conversation