BookmarkSubscribeRSS Feed
Aar684
Calcite | Level 5
Hi all,

I have 2 datasets with the same variables. One data set has about a million records with the other containing about 2000. The main variable is accountnumber.

So, herein lies my problem : The big dataset contains some of the account numbers from the smaller dataset, but the accompanying data with them is wrong.

I need to have SAS check the large dataset for any of the account numbers found in the smaller data set and then remove them. After they are removed, I need to combine the datasets. Any suggestions?
10 REPLIES 10
Flip
Fluorite | Level 6
You should be able to UPDATE the large one with the small by account number.
Aar684
Calcite | Level 5
Won't that just replace the account numbers that are already there? Some of them aren't in the big set and need to be added. I thought UPDATE only updated the already present account numbers?
Flip
Fluorite | Level 6
No it adds the new ones, as well as replacing existing. I have used it to bring in daily transactions extensively.
DanielSantos
Barite | Level 11
UPDATE will do.

If you have both tables sorted by account number, then you could also perform a DATA MERGE between the two tables, being the smaller one on the right side of the MERGE statement (to overwrite values of the left side table).

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
Peter_C
Rhodochrosite | Level 12
consider MODIFY
it allows update-in-place
Aar684
Calcite | Level 5
Thank you everyone for the help.

I have a change in plans. So I have the dataset with 2 million records give or take. I then have a smaller dataset of about 2000 records. Both have the same variables and names. Once again the main variable I am using is accountnumber. Of the 2000 records in the smaller dataset, there are about 1500 of those same account numbers in the larger dataset. Only the account numbers would be the same, with the rest of the data on that line being different. I need to tell SAS to delete the entries in larger file that have matching account numbers in the smaller file. So, 1500 or so records would be deleted from the larger file that have matching account numbers in the smaller file. Is this possible?
Flip
Fluorite | Level 6
proc sql;
delete from data1 where acct_num in (select acct_num from data2);
Aar684
Calcite | Level 5
Can't thank everyone enough.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
I sure hope that someone's backing up your data (oh, yes, with recoverability in mind)....with multiple versions.

Scott Barry
SBBWorks, Inc.
DanielSantos
Barite | Level 11
Yes it possible, with at least more than 4 different techniques.

I will point just one.

SORT both datasets by acountnumber, then match merge between the two (DATA MERGE) with subsetting IF for non matching observation.

Something like this.

data OUT;
merge BIG (in=BIG) SMALL (in=SMALL);
by ACCOUNTNUMBER; * match by ACCOUNTNUMBER;
if not (BIG and SMALL); * if non matching key, then output;
run;

More info on combining datasets:
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/a001303248.htm
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/a002645476.htm

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1054 views
  • 0 likes
  • 5 in conversation