I have 2 datasets, call them data1 and data2. They both have the same set of variables, one of them being ID.
data2 is a subset of data1. I want to create a variable in data1 that takes on the value of 1 when the ID variable is present in data2, and 0 if not in data2.
What is the most efficient way to do this?
Efficient means no sorting.
Like this ?
data WANT;
if _N_ = 1 then do;
declare hash VET(dataset: "DATA2");
VET.definekey("KEYVAR");
VET.definedone();
end;
set DATA1;
FLAG=VAT.check();
run;
If needed sort datasets by ID then use merge:
data want;
merge data1(in=in1)
data2(in=in2 keep=ID)
; by ID;
if in1 and in2 then flag=1;
else flag=2;
run;
Hello - thanks for the response. I tried this and the merged dataset has duplicate observations. Is there a way to just add a variable, flag, to data1, with value 1 if the ID variable is in data2, an value 0 if not? (without doing a merge)?
Efficient means no sorting.
Like this ?
data WANT;
if _N_ = 1 then do;
declare hash VET(dataset: "DATA2");
VET.definekey("KEYVAR");
VET.definedone();
end;
set DATA1;
FLAG=VAT.check();
run;
The function check returns 0 if the key was found in the hash object.
FLAG = (VAT.check() = 0);
Thank you both! worked perfectly with the edit from andreas.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.