DATA Step, Macro, Functions and more

Comparing IDs across tables

Reply
Frequent Contributor
Posts: 138

Comparing IDs across tables

Hi,

I have two datasets. One has many variables, one of which is a unique ID variable (1 row=1 unique person, no dupes). The second dataset is just a list of IDs. What I want to do is create a variable in Dataset 1 for whether a person's ID also appears in Dataset 2. I know how to do this using joins, but I'm hoping there's a simpler way that I don't know about or haven't thought of.

Any help is much appreciated.

Thanks!

Super User
Super User
Posts: 7,997

Re: Comparing IDs across tables

Posted in reply to Walternate

Hi,

/* Note assumes you have a variable FLAG in your original dataset to hold the update */

proc sql;

     update table HAVE

     set     FLAG="Y"

     where ID in (select distinct ID from DATASET2) ;

quit;   

You could also do it by a normal select:

proc sql;

     create table WANT as

     select     A.*,

                    case when exists(select THIS.ID from DATASET2 THIS where THIS.ID=A.ID) then "Y"

                              else "N" end as FLAG

     from       DATASET1;

quit;

Super Contributor
Posts: 298

Re: Comparing IDs across tables

Posted in reply to Walternate

A data step solution:

data A;

input ID X1 X2;

cards;

100  10 20

200  20 30

500  50 70

;

run;

data B;

input ID;

cards;

100

150

200

100

200

300

;

run;

data want;

   if 0 then set B;

   declare hash h(dataset:'B');

   h.definekey('ID');

   h.definedone();

   do until(last);

      set A end = last;

      flag = 'N';

      if h.find() = 0 then flag = 'Y';

      output;

   end;

stop;

run;

Occasional Contributor
Posts: 13

Re: Comparing IDs across tables

I think there might be a very simple solution to this using the built in features of SAS. There is a temporary variable created when you merge data sets called 'in', which keeps track of where a data record comes from. So, you could do something like this:

*sort the two data files;

proc sort data = work.first_file; by id; run;

proc sort data = work.second_file; by id; run;

*Merge the two files, keeping only the records in the first file, but create a variable that indicates if the ID was in the second file;

Data all;

merge work.first_file (in = A)

           work.second_file (in = B);

*create the variable;

  if B = 1 then in_second_file = 1;

*output only the original records;

if A = 1 then output;

run;

Good luck!

Ask a Question
Discussion stats
  • 3 replies
  • 339 views
  • 0 likes
  • 4 in conversation