Hi All,
I have the following dataset:
data have; input id name $ score; datalines; 1 David 45 1 David 74 2 Sam 45 2 Ram 54 3 Bane 87 3 Mary 92 3 Bane 87 4 Dane 23 5 Jenny 87 5 Ken 87 6 Simran1 63 6 Simran2 65 6 Simran3 53 6 Simran4 73 6 Simran5 33 6 Simran6 45 8 Priya 72 ; run;
I want the following output:
Id Name Score
1 David 45 2 Sam 45 3 Mary 92 4 Dane 23 5 Jenny 87 6 Simran5 33 8 Priya 72
I have tried the following, but it's not working:
proc sort data=have out=have_srt;
by id;
run;
data want; set have_srt; by id; if first.id then start=_n_; retain start; if last.id then do point=max(Start, _n_ -2) to _n_; set have_srt point=point; output; end; drop start; run;
Please, help.
Hello @Moksha,
Good idea to use the POINT= option. Here is a similar approach:
data want;
set have;
by id;
if last.id then do;
if first.id then output;
else do;
_n_+(-1);
set have point=_n_;
output;
end;
end;
run;
The same result can be achieved with this simpler, but likely less efficient code:
data want;
set have;
by id;
if last.id then do;
_n_+(first.id-1);
set have point=_n_;
output;
end;
run;
Hello @Moksha,
Good idea to use the POINT= option. Here is a similar approach:
data want;
set have;
by id;
if last.id then do;
if first.id then output;
else do;
_n_+(-1);
set have point=_n_;
output;
end;
end;
run;
The same result can be achieved with this simpler, but likely less efficient code:
data want;
set have;
by id;
if last.id then do;
_n_+(first.id-1);
set have point=_n_;
output;
end;
run;
Thank you very much @FreelanceReinh . It's very helpful and solved the problem.
data have;
input id name $ score;
datalines;
1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran1 63
6 Simran2 65
6 Simran3 53
6 Simran4 73
6 Simran5 33
6 Simran6 45
8 Priya 72
;
run;
data want;
merge have have(firstobs=2 keep=id rename=(id=_id)) have(firstobs=3 keep=id rename=(id=__id));
if (id=_id and id ne __id) or (id ne _id and id ne lag(id));
drop _id __id;
run;
Thank you @Ksharp , it's very helpful.
I like the use of merge and the firstobs= dataset name parameter proposed by @Ksharp .
Here's a minor tweak (firstobs=3 only) to simplify a bit further:
data have;
input id name $ score ;
datalines;
1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran1 63
6 Simran2 65
6 Simran3 53
6 Simran4 73
6 Simran5 33
6 Simran6 45
8 Priya 72
;
run;
data want (drop=nxt:);
set have ;
by id;
merge have (keep=id) have (firstobs=3 keep=id rename=(id=nxt2_id));
if (first.id=1 and last.id=1) or (last.id=0 and id^=nxt2_id);
run;
Hey @Moksha
You already have a nice set of solutions, but just for fun, two more:
one, that uses hash table and backward-reading of data (this one allows to replace 2nd-last with 3rd-last, 4th-last, etc. by setting the WHICH variable):
data want;
which=2; /* which "n last" from the group */
drop which;
/* hash table to store the list of observations numbers */
declare hash H(ordered:"A"); /* to keep order of IDs */
H.defineKey("id");
H.defineData("p", "_N_");
H.defineDone();
declare hiter I("H");
/* read data backward and get obs number for "n last" from the group */
do point=nobs to 1 by -1;
set have nobs=nobs point=point;
if H.find() then _N_=1; /* counter of obs in a group (counted from the last one) */
else _N_+1;
if _N_ <= which then /* take the closes to the one you want, e.g., if only 2 obs in group will select 2nd, etc.*/
do;
p=point; /* obs to select */
H.replace();
end;
end;
/* loops over selected observations */
do while(I.next()=0);
set have point=p;
output;
end;
stop;
run;
second, that uses backward reading (result is reversed):
data want;
do point=nobs to 1 by -1;
set have point=point nobs=nobs;
output;
end;
stop;
data want;
set want;
by descending ID;
if first.ID then second=0;
second+1;
if first.ID*last.ID OR second=2 then output;
drop second;
run;
Bart
This is a general program that provides for any offset (macrovar OFL) from the last observation:
%let OFL=3; /*Offset from last 0=last, 1=last but 1, etc. */
data want (drop=_: label="Nearest Obs to &OFL before last");
set have (in=firstpass) have (in=secondpass);
by id;
if first.id then call missing (_n1, _n2);
_n1+firstpass; *Count firstpass obs for this ID;
_n2+secondpass;
if dif(secondpass)=1 then _n1=max(1,_n1-&ofl); *Reset _N1 to target obs *;
if secondpass=1 and _n2=_n1;
run;
You control the desired offset value with the
%LET OFL=value-goes-here ;
assignment of a value to macrovar OFL.
The DIF(secondpass) =1 condition (where dif(secondpass) is defined as secondpass-lag(secondpass) is a way to identify the start of the second pass through each ID.
Edit note: Thanks to @FreelanceReinh, I modified the "if dif(secondpass)=1 ..." statement, so that in the event of an underpopulated ID, the target obs is the first (I had it as the last) obs in the ID group.
Another "self-interleaving" masterclass. Kudos, @mkeintz!
After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or
if dif(secondpass)=1 then _n1=max(1,_n1-&ofl); *Reset N1 to target obs *;
for short?
@FreelanceReinh wrote:
Another "self-interleaving" masterclass. Kudos, @mkeintz!
After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or
if dif(secondpass)=1 then _n1=max(1,_n1-&ofl); *Reset N1 to target obs *;for short?
I think you're right. I think I misinterpreted what the OP wants when the ID is underpopulated. They want the first, not the last obs.
I'm going to change my response.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.