Help using Base SAS procedures

To identify the un-matching records

Reply
Contributor
Posts: 57

To identify the un-matching records

Hi,

I have 2 datasets-dset1 o and dset2 t, each with variables var1 var2 and var3

I wanted to create a third dataset dset3 with 2 variables var1 and var2. Dset3 should contain the records where var2 has a match in both dset1 and dset2 and var1 from dset2 which does not have a match in dset1.

I'm dealing with 2 huge datasets. So I'll crash the system if I attempt to do multiple iteration.

Please Help.

Thanks in advance.
Sandhya.
Super Contributor
Super Contributor
Posts: 3,174

Re: To identify the un-matching records

Use SAS PROC SORT on each input file, followed by a DATA step with a MERGE and a BY statement. And, also you will want to code the IN= dataset option for each file named on the SET statement -- then reference those IN= SAS variables to determine what input file actually contributes to the MERGE / BY group processing. By searching the SAS support website http://support.sas.com/ you will find code examples for this objective as you have described.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic / post:

data step programming by group processing site:sas.com
Contributor
Posts: 57

Re: To identify the un-matching records

Thanks for the suggestion.

But my datasets has a volume of more than 2 million records. So I'm looking for minimum iteration technique.

Thanks,
Sandy.
Contributor
Posts: 33

Re: To identify the un-matching records

If I understand what you're looking for correctly, you want to end up with a data set that has records that var2 is the same in your contributing data sets but var1 is not? Is that right? If so, this could work for you.

PROC SORT the two data sets by var2.
Then do a DATA / MERGE / BY / IF.

DATA DSET3;
MERGE DSET1 (IN=A) DSET2 (IN=B RENAME=(VAR1=DSET2VAR1));
BY VAR2;
IF A AND B;
IF VAR1=DSET2VAR1 THEN DELETE;

This puts all the records into DSET3 where they have the same VAR2 and then takes out those where the VAR1 is the same.

SAS doesn't care how many records you have - only issue would be your processor / storage / work space.
Super User
Posts: 17,840

Re: To identify the un-matching records

You can try a sql merge (untested). From a processing perspective I'm not sure which is faster or more efficient. But you can do it without sorting.

proc sql;
create table dset3 as
select var1, var2
from dset1 a
join dset2 b
on a.var2=b.var2
and a.var1 ne b.var1;
quit; Message was edited by: Reeza
Super User
Posts: 9,681

Re: To identify the un-matching records

Hi.
Maybe using proc format to change your dataset1 into format will be a good choice,
This method is efficient.Suggest posting some dummy data to see how to code it.

Ksharp
Respected Advisor
Posts: 3,894

Re: To identify the un-matching records

Hi Sandhya

If datasets with 2M records are huge and will crash the system depends on the environment you're in. For some of us 2M is not that huge.

It looks to me as if this is a once off task so performance will not be that important.

I therefore believe that you should give Scott's suggestion a go as it is easy to implement and less memory intensive than you might think.

You could use the sort option NOEQUALS to allow Proc Sort to use a simpler sort algorithm. Just give it a try.

If this anser is not good enough for you then have a look at these papers:
http://www2.sas.com/proceedings/sugi26/p104-26.pdf
http://support.sas.com/resources/papers/proceedings09/197-2009.pdf

Using indexes could be an alternate way to go.

HTH
Patrick
Valued Guide
Posts: 2,175

Re: To identify the un-matching records

since you want to avoid sorting, I presume your data are not in an order useful to the query. Consider a hash table. How wide and how many are the keys? If the keys of table1 can be stored with indexes on var1 and var2, then you should be able to collect your query results with a single pass through each table.
However, sort/merge might take less time to get results if you have to addin the learning curve.

good luck
peterC
Ask a Question
Discussion stats
  • 7 replies
  • 222 views
  • 0 likes
  • 7 in conversation