Re: need help with this pls

Tal · Posted 02-25-2015 06:38 PM

I have a dataset like this:

Fname lname id score

--------------------------

John smith. 100 a

John smith. 100 b

Tom. Fonda. 200 a

Tom. Fonda. 300 b

Mila. Fonda. 400 a

.....

If same ppl apear more then once but with different id i randomly select only one record of that person but if they appear multiple times with the same id then they all supposed to get selected. Also single records get selected too. How do i achieve this. Anyone please?

By same ppl i mean same first and last name

Jagadishkatam · Posted 02-25-2015 10:54 PM

Please try

data have;

input Fname$ lname$ id score$;

cards;

John smith. 100 a

John smith. 100 b

Tom. Fonda. 200 a

Tom. Fonda. 300 b

Mila. Fonda. 400 a

;

proc sort data=have;

by Fname lname id;

run;

data want(drop=seq) id(keep=Fname lname id seq);

set have;

by Fname lname id;

if first.id then seq=1;

else seq+1;

output want;

if last.id then output id;

run;

data all;

merge want id;

by Fname lname id;

if seq >=1;

run;

Thanks,

Jag

Thanks,
Jag

art297 · Posted 02-26-2015 12:45 AM

It's late, so I don't guarantee that the following meets all of your stated conditions. Besides, that's your job to make sure that code does what you want it to do.

However, to understand the following code, you may have to do some Google searches to understand how a double-dow works.

data have;

input Fname$ lname$ id score$;

cards;

John smith. 100 a

John smith. 100 b

Tom. Fonda. 200 a

Tom. Fonda. 300 b

Mila. Fonda. 400 a

;

proc sort data=have;

by Fname lname id;

run;

data want;

do until (last.lname);

set have;

by Fname lname id;

if first.lname then nameseq=1;

else nameseq+1;

if first.id then idseq=1;

else idseq+1;

end;

do until (last.lname);

set have;

by Fname lname id;

random=ranuni(0);

output;

end;

run;

proc sort data=want;

by Fname lname random;

run;

data want (drop=nameseq idseq random);

set want;

by Fname lname;

if nameseq ge 2 and idseq ge 2 then output;

else if nameseq eq 1 and idseq eq 1 then output;

else if first.lname then output;

run;

shubhayog · Posted 02-26-2015 07:12 AM

Hi Tal,

I would like to update the reply from Jagadishkatam,

Please try with following

Data have;

input FNAME$ LNAME$ ID SCORE$;

datalines;

John smith. 100 a

John smith. 100 b

Tom. Fonda. 200 a

Tom. Fonda. 300 b

Mila. Fonda. 400 a

Mila. smith. 500 a

Zen Fonda. 600 b

Tom. smith. 700 c

;

run;

proc sort

data=have;

by FNAME LNAME ID;

run;

data want;

set have;

by FNAME LNAME ID;

if first.ID then ISEQ=1;

else ISEQ+1;

if first.LNAME then LSEQ=1;

else LSEQ+1;

run;

data all (drop=ISEQ LSEQ);

merge want have;

by FNAME LNAME ID;

if ISEQ eq LSEQ;

run;

Hope it will work for you,

Please share if any abnormal observations.

Thank you Jagadishkatam for hint

Thanks and Regards,

Yogesh

RW9 · Posted 02-26-2015 08:17 AM

Hi,

Well, not sure about your logic of "i randomly select only one record", would need to be some kind of logical selection. In my example below I take the smallest for instance - first part gets records with all the same, second gets the different ones:

proc sql;

create table WANT as

select A.FNAME,

A.LNAME,

A.ID,

A.SCORE

from WORK.HAVE A

where not exists(select distinct FNAME from WORK.HAVE where FNAME=A.FNAME and LNAME=A.LNAME and ID ne A.ID)

union all

select FNAME,

LNAME,

ID,

SCORE

from (select * from WORK.HAVE B where exists(select distinct FNAME from WORK.HAVE where FNAME=B.FNAME and LNAME=B.LNAME and ID ne B.ID))

group by FNAME,LNAME

having SCORE=min(SCORE);

quit;

Tal · Posted 03-03-2015 10:09 AM

luckely i dont need to do this but thanks for the approaches guys .

will go over them just for practice

need help with this pls