I have a dataset like this:
Fname lname id score
--------------------------
John smith. 100 a
John smith. 100 b
Tom. Fonda. 200 a
Tom. Fonda. 300 b
Mila. Fonda. 400 a
.....
If same ppl apear more then once but with different id i randomly select only one record of that person but if they appear multiple times with the same id then they all supposed to get selected. Also single records get selected too. How do i achieve this. Anyone please?
By same ppl i mean same first and last name
Please try
data have;
input Fname$ lname$ id score$;
cards;
John smith. 100 a
John smith. 100 b
Tom. Fonda. 200 a
Tom. Fonda. 300 b
Mila. Fonda. 400 a
;
proc sort data=have;
by Fname lname id;
run;
data want(drop=seq) id(keep=Fname lname id seq);
set have;
by Fname lname id;
if first.id then seq=1;
else seq+1;
output want;
if last.id then output id;
run;
data all;
merge want id;
by Fname lname id;
if seq >=1;
run;
Thanks,
Jag
It's late, so I don't guarantee that the following meets all of your stated conditions. Besides, that's your job to make sure that code does what you want it to do.
However, to understand the following code, you may have to do some Google searches to understand how a double-dow works.
data have;
input Fname$ lname$ id score$;
cards;
John smith. 100 a
John smith. 100 b
Tom. Fonda. 200 a
Tom. Fonda. 300 b
Mila. Fonda. 400 a
;
proc sort data=have;
by Fname lname id;
run;
data want;
do until (last.lname);
set have;
by Fname lname id;
if first.lname then nameseq=1;
else nameseq+1;
if first.id then idseq=1;
else idseq+1;
end;
do until (last.lname);
set have;
by Fname lname id;
random=ranuni(0);
output;
end;
run;
proc sort data=want;
by Fname lname random;
run;
data want (drop=nameseq idseq random);
set want;
by Fname lname;
if nameseq ge 2 and idseq ge 2 then output;
else if nameseq eq 1 and idseq eq 1 then output;
else if first.lname then output;
run;
Hi Tal,
I would like to update the reply from Jagadishkatam,
Please try with following
Data have;
input FNAME$ LNAME$ ID SCORE$;
datalines;
John smith. 100 a
John smith. 100 b
Tom. Fonda. 200 a
Tom. Fonda. 300 b
Mila. Fonda. 400 a
Mila. smith. 500 a
Zen Fonda. 600 b
Tom. smith. 700 c
;
run;
proc sort
data=have;
by FNAME LNAME ID;
run;
data want;
set have;
by FNAME LNAME ID;
if first.ID then ISEQ=1;
else ISEQ+1;
if first.LNAME then LSEQ=1;
else LSEQ+1;
run;
data all (drop=ISEQ LSEQ);
merge want have;
by FNAME LNAME ID;
if ISEQ eq LSEQ;
run;
Hope it will work for you,
Please share if any abnormal observations.
Thank you Jagadishkatam for hint
Thanks and Regards,
Yogesh
Hi,
Well, not sure about your logic of "i randomly select only one record", would need to be some kind of logical selection. In my example below I take the smallest for instance - first part gets records with all the same, second gets the different ones:
proc sql;
create table WANT as
select A.FNAME,
A.LNAME,
A.ID,
A.SCORE
from WORK.HAVE A
where not exists(select distinct FNAME from WORK.HAVE where FNAME=A.FNAME and LNAME=A.LNAME and ID ne A.ID)
union all
select FNAME,
LNAME,
ID,
SCORE
from (select * from WORK.HAVE B where exists(select distinct FNAME from WORK.HAVE where FNAME=B.FNAME and LNAME=B.LNAME and ID ne B.ID))
group by FNAME,LNAME
having SCORE=min(SCORE);
quit;
luckely i dont need to do this but thanks for the approaches guys .
will go over them just for practice
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.