BookmarkSubscribeRSS Feed
Sandhya
Fluorite | Level 6
Hi,

I have 2 datasets-dset1 o and dset2 t, each with variables var1 var2 and var3

I wanted to create a third dataset dset3 with 2 variables var1 and var2. Dset3 should contain the records where var2 has a match in both dset1 and dset2 and var1 from dset2 which does not have a match in dset1.

I'm dealing with 2 huge datasets. So I'll crash the system if I attempt to do multiple iteration.

Please Help.

Thanks in advance.
Sandhya.
7 REPLIES 7
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Use SAS PROC SORT on each input file, followed by a DATA step with a MERGE and a BY statement. And, also you will want to code the IN= dataset option for each file named on the SET statement -- then reference those IN= SAS variables to determine what input file actually contributes to the MERGE / BY group processing. By searching the SAS support website http://support.sas.com/ you will find code examples for this objective as you have described.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic / post:

data step programming by group processing site:sas.com
Sandhya
Fluorite | Level 6
Thanks for the suggestion.

But my datasets has a volume of more than 2 million records. So I'm looking for minimum iteration technique.

Thanks,
Sandy.
stateworker
Fluorite | Level 6
If I understand what you're looking for correctly, you want to end up with a data set that has records that var2 is the same in your contributing data sets but var1 is not? Is that right? If so, this could work for you.

PROC SORT the two data sets by var2.
Then do a DATA / MERGE / BY / IF.

DATA DSET3;
MERGE DSET1 (IN=A) DSET2 (IN=B RENAME=(VAR1=DSET2VAR1));
BY VAR2;
IF A AND B;
IF VAR1=DSET2VAR1 THEN DELETE;

This puts all the records into DSET3 where they have the same VAR2 and then takes out those where the VAR1 is the same.

SAS doesn't care how many records you have - only issue would be your processor / storage / work space.
Reeza
Super User
You can try a sql merge (untested). From a processing perspective I'm not sure which is faster or more efficient. But you can do it without sorting.

proc sql;
create table dset3 as
select var1, var2
from dset1 a
join dset2 b
on a.var2=b.var2
and a.var1 ne b.var1;
quit; Message was edited by: Reeza
Ksharp
Super User
Hi.
Maybe using proc format to change your dataset1 into format will be a good choice,
This method is efficient.Suggest posting some dummy data to see how to code it.

Ksharp
Patrick
Opal | Level 21
Hi Sandhya

If datasets with 2M records are huge and will crash the system depends on the environment you're in. For some of us 2M is not that huge.

It looks to me as if this is a once off task so performance will not be that important.

I therefore believe that you should give Scott's suggestion a go as it is easy to implement and less memory intensive than you might think.

You could use the sort option NOEQUALS to allow Proc Sort to use a simpler sort algorithm. Just give it a try.

If this anser is not good enough for you then have a look at these papers:
http://www2.sas.com/proceedings/sugi26/p104-26.pdf
http://support.sas.com/resources/papers/proceedings09/197-2009.pdf

Using indexes could be an alternate way to go.

HTH
Patrick
Peter_C
Rhodochrosite | Level 12
since you want to avoid sorting, I presume your data are not in an order useful to the query. Consider a hash table. How wide and how many are the keys? If the keys of table1 can be stored with indexes on var1 and var2, then you should be able to collect your query results with a single pass through each table.
However, sort/merge might take less time to get results if you have to addin the learning curve.

good luck
peterC

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 976 views
  • 0 likes
  • 7 in conversation