Re: Combining Datasets

renjithr · Posted 07-11-2013 07:33 PM

Hi,

I have a requirement where I need to combine two datasets, but here is what is causing me trouble.

For example: master dataset

Emp_loc Emp_sal

USA 100000.00

Newyork 25000.00

California 25000.00

Texas 25000.00

Nevada 25000.00

Transaction dataset:

Emp_loc Emp_sal

Texas 30000.00

USA 105000.00

I want to update the master dataset, but need to keep the values of the EMP_Loc in the same order ie, after my change the dataset should look like this:

Updated Master dataset:

Emp_loc Emp_sal

USA 105000.00

Newyork 25000.00

California 25000.00

Texas 30000.00

Nevada 25000.00

Please share your ideas.

Thanks

Ksharp · Posted 07-11-2013 08:25 PM

Hash Table:

data master;
input Emp_loc $      Emp_sal ;
cards;
USA             100000.00
Newyork        25000.00
California       25000.00
Texas            25000.00
Nevada          25000.00
;
run;
 

 

data Transaction;
input Emp_loc $ Emp_sal ;
cards;
Texas     30000.00
USA       105000.00
;
run;
data want(drop=rc);
 if _n_ eq 1 then do;
  if 0 then set  Transaction;
  declare hash ha(dataset:'Transaction');
   ha.definekey('Emp_loc');
   ha.definedata('Emp_sal');
   ha.definedone();
 end;
set master;
 rc=ha.find();
run;

Ksharp

Jagadishkatam · Posted 07-11-2013 10:45 PM

The hash code used by Ksharp above is more efficient.

However there is alternative approach as well by Merge statement, please try the below code

data dat1;

input Emp_loc $ Emp_sal;

order=_n_;

cards;

USA 100000.00

Newyork 25000.00

California 25000.00

Texas 25000.00

Nevada 25000.00

;

run;

data dat2;

input Emp_loc $ Emp_sal;

cards;

Texas 30000.00

USA 105000.00

;

run;

proc sort data = dat1;

by emp_loc;

run;

proc sort data = dat2;

by emp_loc;

run;

data all;

merge dat1(in=a) dat2(in=b);

by emp_loc;

run;

proc sort data = all;

by order;

run;

Hope this helps you.

Thanks,

Jagadish

Thanks,
Jag

Scott_Mitchell · Posted 07-12-2013 05:18 AM

From what I understand Hash tables are very useful in instances where you have a large table and a small lookup table. There is a point where hash tables become less effecient, so it is important that you undertake sufficient testing to ensure you are on the right track. Therefore I have assumed this is a sample and have provided the below alternative.

DATA MASTER;

INPUT EMP_LOC $ EMP_SAL ;

CARDS;

USA 100000.00

NEWYORK 25000.00

CALIFORNIA 25000.00

TEXAS 25000.00

NEVADA 25000.00

;

RUN;

DATA TRANSACTION;

INPUT EMP_LOC $ EMP_SAL ;

CARDS;

TEXAS 30000.00

USA 105000.00

;

RUN;

PROC SQL;

UPDATE MASTER

SET EMP_SAL = (SELECT EMP_SAL

FROM TRANSACTION

WHERE MASTER.EMP_LOC = TRANSACTION.EMP_LOC)

WHERE EMP_LOC IN (SELECT EMP_LOC

FROM TRANSACTION)

;

QUIT;

Regards,

Scott

renjithr · Posted 07-12-2013 11:55 AM

Thank you all for sharing your thoughts!

Combining Datasets