Solved: Re: Create a variable based on another dataset

Melk · Posted 06-01-2017 08:18 PM

I have 2 datasets, call them data1 and data2. They both have the same set of variables, one of them being ID.

data2 is a subset of data1. I want to create a variable in data1 that takes on the value of 1 when the ID variable is present in data2, and 0 if not in data2.

What is the most efficient way to do this?

ChrisNZ · Posted 06-01-2017 11:08 PM

Efficient means no sorting.

Like this ?

data WANT; 
  if _N_ = 1 then do;  
    declare hash VET(dataset: "DATA2"); 
    VET.definekey("KEYVAR"); 
    VET.definedone();
  end; 
  set DATA1; 
  FLAG=VAT.check();
run;

High-Performance SAS Coding - Third Edition

View solution in original post

Shmuel · Posted 06-01-2017 08:32 PM

If needed sort datasets by ID then use merge:

data want;
merge data1(in=in1)
      data2(in=in2 keep=ID)
 ;  by ID;
       if in1 and in2 then flag=1;
       else flag=2;
run;

Melk · Posted 06-06-2017 07:17 PM

Hello - thanks for the response. I tried this and the merged dataset has duplicate observations. Is there a way to just add a variable, flag, to data1, with value 1 if the ID variable is in data2, an value 0 if not? (without doing a merge)?

ChrisNZ · Posted 06-01-2017 11:08 PM

Efficient means no sorting.

Like this ?

data WANT; 
  if _N_ = 1 then do;  
    declare hash VET(dataset: "DATA2"); 
    VET.definekey("KEYVAR"); 
    VET.definedone();
  end; 
  set DATA1; 
  FLAG=VAT.check();
run;

High-Performance SAS Coding - Third Edition