Hi All,
I have the following dataset:
data have; input id name $ score; datalines; 1 David 45 1 David 74 2 Sam 45 2 Ram 54 3 Bane 87 3 Mary 92 3 Bane 87 4 Dane 23 5 Jenny 87 5 Ken 87 6 Simran1 63 6 Simran2 65 6 Simran3 53 6 Simran4 73 6 Simran5 33 6 Simran6 45 8 Priya 72 ; run;
I want the following output:
Id Name Score
1 David 45 2 Sam 45 3 Mary 92 4 Dane 23 5 Jenny 87 6 Simran5 33 8 Priya 72
I have tried the following, but it's not working:
proc sort data=have out=have_srt;
by id;
run;
data want; set have_srt; by id; if first.id then start=_n_; retain start; if last.id then do point=max(Start, _n_ -2) to _n_; set have_srt point=point; output; end; drop start; run;
Please, help.
Hello @Moksha,
Good idea to use the POINT= option. Here is a similar approach:
data want;
set have;
by id;
if last.id then do;
if first.id then output;
else do;
_n_+(-1);
set have point=_n_;
output;
end;
end;
run;
The same result can be achieved with this simpler, but likely less efficient code:
data want;
set have;
by id;
if last.id then do;
_n_+(first.id-1);
set have point=_n_;
output;
end;
run;
Hello @Moksha,
Good idea to use the POINT= option. Here is a similar approach:
data want;
set have;
by id;
if last.id then do;
if first.id then output;
else do;
_n_+(-1);
set have point=_n_;
output;
end;
end;
run;
The same result can be achieved with this simpler, but likely less efficient code:
data want;
set have;
by id;
if last.id then do;
_n_+(first.id-1);
set have point=_n_;
output;
end;
run;
Thank you very much @FreelanceReinh . It's very helpful and solved the problem.
data have;
input id name $ score;
datalines;
1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran1 63
6 Simran2 65
6 Simran3 53
6 Simran4 73
6 Simran5 33
6 Simran6 45
8 Priya 72
;
run;
data want;
merge have have(firstobs=2 keep=id rename=(id=_id)) have(firstobs=3 keep=id rename=(id=__id));
if (id=_id and id ne __id) or (id ne _id and id ne lag(id));
drop _id __id;
run;
Thank you @Ksharp , it's very helpful.
I like the use of merge and the firstobs= dataset name parameter proposed by @Ksharp .
Here's a minor tweak (firstobs=3 only) to simplify a bit further:
data have;
input id name $ score ;
datalines;
1 David 45
1 David 74
2 Sam 45
2 Ram 54
3 Bane 87
3 Mary 92
3 Bane 87
4 Dane 23
5 Jenny 87
5 Ken 87
6 Simran1 63
6 Simran2 65
6 Simran3 53
6 Simran4 73
6 Simran5 33
6 Simran6 45
8 Priya 72
;
run;
data want (drop=nxt:);
set have ;
by id;
merge have (keep=id) have (firstobs=3 keep=id rename=(id=nxt2_id));
if (first.id=1 and last.id=1) or (last.id=0 and id^=nxt2_id);
run;
Hey @Moksha
You already have a nice set of solutions, but just for fun, two more:
one, that uses hash table and backward-reading of data (this one allows to replace 2nd-last with 3rd-last, 4th-last, etc. by setting the WHICH variable):
data want;
which=2; /* which "n last" from the group */
drop which;
/* hash table to store the list of observations numbers */
declare hash H(ordered:"A"); /* to keep order of IDs */
H.defineKey("id");
H.defineData("p", "_N_");
H.defineDone();
declare hiter I("H");
/* read data backward and get obs number for "n last" from the group */
do point=nobs to 1 by -1;
set have nobs=nobs point=point;
if H.find() then _N_=1; /* counter of obs in a group (counted from the last one) */
else _N_+1;
if _N_ <= which then /* take the closes to the one you want, e.g., if only 2 obs in group will select 2nd, etc.*/
do;
p=point; /* obs to select */
H.replace();
end;
end;
/* loops over selected observations */
do while(I.next()=0);
set have point=p;
output;
end;
stop;
run;
second, that uses backward reading (result is reversed):
data want;
do point=nobs to 1 by -1;
set have point=point nobs=nobs;
output;
end;
stop;
data want;
set want;
by descending ID;
if first.ID then second=0;
second+1;
if first.ID*last.ID OR second=2 then output;
drop second;
run;
Bart
This is a general program that provides for any offset (macrovar OFL) from the last observation:
%let OFL=3; /*Offset from last 0=last, 1=last but 1, etc. */
data want (drop=_: label="Nearest Obs to &OFL before last");
set have (in=firstpass) have (in=secondpass);
by id;
if first.id then call missing (_n1, _n2);
_n1+firstpass; *Count firstpass obs for this ID;
_n2+secondpass;
if dif(secondpass)=1 then _n1=max(1,_n1-&ofl); *Reset _N1 to target obs *;
if secondpass=1 and _n2=_n1;
run;
You control the desired offset value with the
%LET OFL=value-goes-here ;
assignment of a value to macrovar OFL.
The DIF(secondpass) =1 condition (where dif(secondpass) is defined as secondpass-lag(secondpass) is a way to identify the start of the second pass through each ID.
Edit note: Thanks to @FreelanceReinh, I modified the "if dif(secondpass)=1 ..." statement, so that in the event of an underpopulated ID, the target obs is the first (I had it as the last) obs in the ID group.
Another "self-interleaving" masterclass. Kudos, @mkeintz!
After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or
if dif(secondpass)=1 then _n1=max(1,_n1-&ofl); *Reset N1 to target obs *;
for short?
@FreelanceReinh wrote:
Another "self-interleaving" masterclass. Kudos, @mkeintz!
After analyzing the clever logic, I would have expected 1, not _n1, in the second argument of the IFN function or
if dif(secondpass)=1 then _n1=max(1,_n1-&ofl); *Reset N1 to target obs *;for short?
I think you're right. I think I misinterpreted what the OP wants when the ID is underpopulated. They want the first, not the last obs.
I'm going to change my response.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.