Hi Guys
Please assist. What is the most optimal way to check if two variables exist in 2 separate datasets. The file sizes I am dealing with is quite large - around 50million records in each dataset.
Table 1 Table 2
A0001 A0001
A0002 A0003
A0003 A0004
A0004 A0005
A0005
A0006
Output dataset required:
ACCOUNT_ID STATUS
A0001 EXISTS in BOTH TABLES
A0002 DOES NOT EXIST
A0003 EXISTS in BOTH TABLES
A0004 EXISTS in BOTH TABLES
A0005 EXISTS in BOTH TABLES
A0006 DOES NOT EXIST
Thank you for your assistance.
Well, SQL probably isn't the best for this, so try a datastep merge on distinct values:
proc sort data=data1 out=tmp1 nodupkey;
by account_id;
run;
proc sort data=data2 out=tmp2 nodupkey;
by account_id;
run;
data want;
merge tmp1 (in=a) tmp2 (in=b);
length status $50;
if a and b then status="In Both";
else if a then status="Only in A";
else status="Only in B";
run;
It will take a while to run though just due to the quantity of data.
Are we talking about variables or values?
Hash Table .
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.