I have 2 datasets-dset1 o and dset2 t, each with variables var1 var2 and var3
I wanted to create a third dataset dset3 with 2 variables var1 and var2. Dset3 should contain the records where var2 has a match in both dset1 and dset2 and var1 from dset2 which does not have a match in dset1.
I'm dealing with 2 huge datasets. So I'll crash the system if I attempt to do multiple iteration.
Use SAS PROC SORT on each input file, followed by a DATA step with a MERGE and a BY statement. And, also you will want to code the IN= dataset option for each file named on the SET statement -- then reference those IN= SAS variables to determine what input file actually contributes to the MERGE / BY group processing. By searching the SAS support website http://support.sas.com/ you will find code examples for this objective as you have described.
Suggested Google advanced search argument, this topic / post:
data step programming by group processing site:sas.com
If I understand what you're looking for correctly, you want to end up with a data set that has records that var2 is the same in your contributing data sets but var1 is not? Is that right? If so, this could work for you.
PROC SORT the two data sets by var2.
Then do a DATA / MERGE / BY / IF.
MERGE DSET1 (IN=A) DSET2 (IN=B RENAME=(VAR1=DSET2VAR1));
IF A AND B;
IF VAR1=DSET2VAR1 THEN DELETE;
This puts all the records into DSET3 where they have the same VAR2 and then takes out those where the VAR1 is the same.
SAS doesn't care how many records you have - only issue would be your processor / storage / work space.
since you want to avoid sorting, I presume your data are not in an order useful to the query. Consider a hash table. How wide and how many are the keys? If the keys of table1 can be stored with indexes on var1 and var2, then you should be able to collect your query results with a single pass through each table.
However, sort/merge might take less time to get results if you have to addin the learning curve.