I have 2 datasets, call them data1 and data2. They both have the same set of variables, one of them being ID.
data2 is a subset of data1. I want to create a variable in data1 that takes on the value of 1 when the ID variable is present in data2, and 0 if not in data2.
What is the most efficient way to do this?
Efficient means no sorting.
Like this ?
data WANT;
if _N_ = 1 then do;
declare hash VET(dataset: "DATA2");
VET.definekey("KEYVAR");
VET.definedone();
end;
set DATA1;
FLAG=VAT.check();
run;
If needed sort datasets by ID then use merge:
data want;
merge data1(in=in1)
data2(in=in2 keep=ID)
; by ID;
if in1 and in2 then flag=1;
else flag=2;
run;
Hello - thanks for the response. I tried this and the merged dataset has duplicate observations. Is there a way to just add a variable, flag, to data1, with value 1 if the ID variable is in data2, an value 0 if not? (without doing a merge)?
Efficient means no sorting.
Like this ?
data WANT;
if _N_ = 1 then do;
declare hash VET(dataset: "DATA2");
VET.definekey("KEYVAR");
VET.definedone();
end;
set DATA1;
FLAG=VAT.check();
run;
The function check returns 0 if the key was found in the hash object.
FLAG = (VAT.check() = 0);
Thank you both! worked perfectly with the edit from andreas.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.