08-17-2015 05:21 AM
Please assist. What is the most optimal way to check if two variables exist in 2 separate datasets. The file sizes I am dealing with is quite large - around 50million records in each dataset.
Table 1 Table 2
Output dataset required:
A0001 EXISTS in BOTH TABLES
A0002 DOES NOT EXIST
A0003 EXISTS in BOTH TABLES
A0004 EXISTS in BOTH TABLES
A0005 EXISTS in BOTH TABLES
A0006 DOES NOT EXIST
Thank you for your assistance.
08-17-2015 05:34 AM
Well, SQL probably isn't the best for this, so try a datastep merge on distinct values:
proc sort data=data1 out=tmp1 nodupkey;
proc sort data=data2 out=tmp2 nodupkey;
merge tmp1 (in=a) tmp2 (in=b);
length status $50;
if a and b then status="In Both";
else if a then status="Only in A";
else status="Only in B";
It will take a while to run though just due to the quantity of data.
08-17-2015 07:25 AM
Are we talking about variables or values?