DATA Step, Macro, Functions and more

2 data sets

Reply
Contributor
Posts: 54

2 data sets

Hi all,

I have 2 datasets with the same variables. One data set has about a million records with the other containing about 2000. The main variable is accountnumber.

So, herein lies my problem : The big dataset contains some of the account numbers from the smaller dataset, but the accompanying data with them is wrong.

I need to have SAS check the large dataset for any of the account numbers found in the smaller data set and then remove them. After they are removed, I need to combine the datasets. Any suggestions?
Super Contributor
Posts: 359

Re: 2 data sets

You should be able to UPDATE the large one with the small by account number.
Contributor
Posts: 54

Re: 2 data sets

Won't that just replace the account numbers that are already there? Some of them aren't in the big set and need to be added. I thought UPDATE only updated the already present account numbers?
Super Contributor
Posts: 359

Re: 2 data sets

No it adds the new ones, as well as replacing existing. I have used it to bring in daily transactions extensively.
Super Contributor
Posts: 474

Re: 2 data sets

UPDATE will do.

If you have both tables sorted by account number, then you could also perform a DATA MERGE between the two tables, being the smaller one on the right side of the MERGE statement (to overwrite values of the left side table).

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
Valued Guide
Posts: 2,175

Re: 2 data sets

consider MODIFY
it allows update-in-place
Contributor
Posts: 54

Re: 2 data sets

Thank you everyone for the help.

I have a change in plans. So I have the dataset with 2 million records give or take. I then have a smaller dataset of about 2000 records. Both have the same variables and names. Once again the main variable I am using is accountnumber. Of the 2000 records in the smaller dataset, there are about 1500 of those same account numbers in the larger dataset. Only the account numbers would be the same, with the rest of the data on that line being different. I need to tell SAS to delete the entries in larger file that have matching account numbers in the smaller file. So, 1500 or so records would be deleted from the larger file that have matching account numbers in the smaller file. Is this possible?
Super Contributor
Posts: 359

Re: 2 data sets

proc sql;
delete from data1 where acct_num in (select acct_num from data2);
Contributor
Posts: 54

Re: 2 data sets

Can't thank everyone enough.
Super Contributor
Super Contributor
Posts: 3,174

Re: 2 data sets

I sure hope that someone's backing up your data (oh, yes, with recoverability in mind)....with multiple versions.

Scott Barry
SBBWorks, Inc.
Super Contributor
Posts: 474

Re: 2 data sets

Yes it possible, with at least more than 4 different techniques.

I will point just one.

SORT both datasets by acountnumber, then match merge between the two (DATA MERGE) with subsetting IF for non matching observation.

Something like this.

data OUT;
merge BIG (in=BIG) SMALL (in=SMALL);
by ACCOUNTNUMBER; * match by ACCOUNTNUMBER;
if not (BIG and SMALL); * if non matching key, then output;
run;

More info on combining datasets:
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/a001303248.htm
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/a002645476.htm

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
Ask a Question
Discussion stats
  • 10 replies
  • 218 views
  • 0 likes
  • 5 in conversation