I have 2 datasets, call them data1 and data2. They both have the same set of variables, one of them being ID.
data2 is a subset of data1. I want to create a variable in data1 that takes on the value of 1 when the ID variable is present in data2, and 0 if not in data2.
What is the most efficient way to do this?
Efficient means no sorting.
Like this ?
data WANT;
if _N_ = 1 then do;
declare hash VET(dataset: "DATA2");
VET.definekey("KEYVAR");
VET.definedone();
end;
set DATA1;
FLAG=VAT.check();
run;
If needed sort datasets by ID then use merge:
data want;
merge data1(in=in1)
data2(in=in2 keep=ID)
; by ID;
if in1 and in2 then flag=1;
else flag=2;
run;
Hello - thanks for the response. I tried this and the merged dataset has duplicate observations. Is there a way to just add a variable, flag, to data1, with value 1 if the ID variable is in data2, an value 0 if not? (without doing a merge)?
Efficient means no sorting.
Like this ?
data WANT;
if _N_ = 1 then do;
declare hash VET(dataset: "DATA2");
VET.definekey("KEYVAR");
VET.definedone();
end;
set DATA1;
FLAG=VAT.check();
run;
The function check returns 0 if the key was found in the hash object.
FLAG = (VAT.check() = 0);
Thank you both! worked perfectly with the edit from andreas.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.