BookmarkSubscribeRSS Feed
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

I have a dataset like this:

Fname lname id score

--------------------------

John    smith.  100 a

John    smith.  100 b

Tom.   Fonda.  200 a

Tom.   Fonda.  300 b

Mila.   Fonda.   400 a

.....

If same ppl apear more then once but with different id i randomly select only one record of that person but if they appear multiple times with the same id then they all supposed to get selected. Also single records get selected too. How do i achieve this. Anyone please?

By same ppl i mean same first and last name

5 REPLIES 5
Jagadishkatam
Amethyst | Level 16

Please try

data have;

input Fname$ lname$ id score$;

cards;

John    smith.  100 a

John    smith.  100 b

Tom.   Fonda.  200 a

Tom.   Fonda.  300 b

Mila.   Fonda.   400 a

;

proc sort data=have;

by Fname lname id;

run;

data want(drop=seq) id(keep=Fname lname id seq);

set have;

by Fname lname id;

if first.id then seq=1;

else seq+1;

output want;

if last.id then output id;

run;

data all;

  merge want id;

  by Fname lname id;

  if seq >=1;

run;

Thanks,

Jag

Thanks,
Jag
art297
Opal | Level 21

It's late, so I don't guarantee that the following meets all of your stated conditions. Besides, that's your job to make sure that code does what you want it to do.

However, to understand the following code, you may have to do some Google searches to understand how a double-dow works.

data have;

  input Fname$ lname$ id score$;

  cards;

John    smith.  100 a

John    smith.  100 b

Tom.   Fonda.  200 a

Tom.   Fonda.  300 b

Mila.   Fonda.   400 a

;

proc sort data=have;

  by Fname lname id;

run;

data want;

  do until (last.lname);

    set have;

    by Fname lname id;

    if first.lname then nameseq=1;

    else nameseq+1;

    if first.id then idseq=1;

    else idseq+1;

  end;  

  do until (last.lname);

    set have;

    by Fname lname id;

    random=ranuni(0);

    output;

  end;  

run;

proc sort data=want;

  by Fname lname random;

run;

data want (drop=nameseq idseq random);

  set want;

  by Fname lname;

  if nameseq ge 2 and idseq ge 2 then output;

  else if nameseq eq 1 and idseq eq 1 then output;

  else if first.lname then output;

run;

shubhayog
Obsidian | Level 7

Hi Tal,

I would like to update the reply from Jagadishkatam,

Please try with following

Data have;

  input FNAME$ LNAME$ ID SCORE$;

  datalines;

  John    smith.  100 a

  John    smith.  100 b

  Tom.   Fonda.  200 a

  Tom.   Fonda.  300 b

  Mila.   Fonda.   400 a

  Mila. smith. 500 a

  Zen Fonda. 600 b

  Tom. smith. 700 c

  ;

run;

proc sort

  data=have;

  by FNAME LNAME ID;

run;

data want;

  set have;

  by FNAME LNAME ID;

  if first.ID then ISEQ=1;

  else ISEQ+1;

  if first.LNAME then LSEQ=1;

  else LSEQ+1;

run;

data all (drop=ISEQ LSEQ);

  merge want have;

  by FNAME LNAME ID;

  if ISEQ eq LSEQ;

run;

Hope it will work for you,

Please share if any abnormal observations.

Thank you Jagadishkatam for hint Smiley Happy

Thanks and Regards,

Yogesh

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Well, not sure about your logic of "i randomly select only one record", would need to be some kind of logical selection.  In my example below I take the smallest for instance - first part gets records with all the same, second gets the different ones:

proc sql;

  create table WANT as

  select  A.FNAME,

          A.LNAME,

          A.ID,

          A.SCORE

  from    WORK.HAVE A

  where   not exists(select distinct FNAME from WORK.HAVE where FNAME=A.FNAME and LNAME=A.LNAME and ID ne A.ID)

  union all

  select  FNAME,

          LNAME,

          ID,

          SCORE

  from    (select * from WORK.HAVE B where exists(select distinct FNAME from WORK.HAVE where FNAME=B.FNAME and LNAME=B.LNAME and ID ne B.ID))

  group by FNAME,LNAME

  having  SCORE=min(SCORE);

quit;

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

luckely i dont need to do this  but thanks for the approaches  guys .

will  go over  them  just for practice Smiley Happy

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1638 views
  • 10 likes
  • 5 in conversation