Unique List - multiple datasets

Reply
Occasional Contributor
Posts: 7

Unique List - multiple datasets

Hello,

I have 4 datasets with a variable known as ID in them all.

The ID in one dataset can be present in the other 3 too.

I would like to do the following;

- get the unique list of ID's from each dataset

- output the unique list from each data set into a final dataset called "unique ID"which has the combined unique list.

I know I can do this in a set of datasteps, but is there a macro to do this?

Best,

M.

Super User
Posts: 5,255

Re: Unique List - multiple datasets

And SQL too. And any macro logic you need probably do it yourself. If your lucky you could search on the Internet for a sample to start from.

Data never sleeps
Super Contributor
Posts: 282

Re: Unique List - multiple datasets

Hi,

I don't know of a SAS macro that does what you're asking, but the following (untested) macros might help:

%macro get_ids(inds,outds);

  proc sort data=&inds(keep=id)

            out =&outds

            nodupkey;

    by id;

  run;

%mend get_ids;

%macro get_all_ids(dslist);

  data idlist;

    set &dslist;

  run;

  %get_ids(idlist,uniqueid);

%mend get_all_ids;

%get_ids(set_a,set_a2);

%get_ids(set_b,set_b2);

%get_ids(set_c,set_c2);

%get_ids(set_d,set_d2);

%get_ids(set_a2 set_b2 set_c2 set_d2);

Regards,

Amir.

Super User
Super User
Posts: 6,498

Re: Unique List - multiple datasets

Very easy if the datasets are sorted.

data want ;

   set ds1 ds2 ds3 ds4 ;

   by id;

   if first.id;

   keep id;

run;

Super Contributor
Super Contributor
Posts: 440

Re: Unique List - multiple datasets

Like your approach Tom!

options mprint;

%macro test (set1,set2,set3,set4) ;

proc sort data=&set1; by id;run;

proc sort data=&set2; by id;run;

proc sort data=&set3; by id;run;

proc sort data=&set4; by id;run;

data want ;

set &set1 &set2 &set3 &set4;

by id;

if first.id;

keep id;

proc print;run;

%mend test;

%test(a,b,c,d);

Occasional Contributor
Posts: 7

Re: Unique List - multiple datasets

Thank you guys, this was really very helpful, and the unique list generated really helped me do some other tasks that I needed too.

I was able to work with your suggestion, run through 12 datasets (each dataset is a month of a year) and extract a list of unique IDs.

I now have a list if ID's of people who been present at least once in the dataset,

The next step is this:

Sometimes the ID is present twice  or 10 times in a dataset.

I want now to get a one-to-many match for my unique id through all the datatsets.

Here's an example:

Unique ID  List        jan                 feb

------------------          ---------           ------------

ab                               ab                     ef

cd                               ab                     ef

ef                                gh                     ef

gh                               gh                     ef

I want the output to be;

ID      Month  & other variables etc...

----     -------

ab       jan

ab       jan

gh       jan

gh       jan

ef        ef

ef        ef

ef        ef

ef        ef

Super Contributor
Posts: 282

Re: Unique List - multiple datasets

Hi,

How about the following:

data want;

  merge uniqueid

        jan

        feb

  ;

  by id;

  /* if month ne ''; /* uncomment if you only want to see matches */

run;

If order matters then you can sort the output by month and id.

Regards,

Amir.

Ask a Question
Discussion stats
  • 6 replies
  • 332 views
  • 1 like
  • 5 in conversation