Hi,
I have a requirement where I need to combine two datasets, but here is what is causing me trouble.
For example: master dataset
Emp_loc Emp_sal
USA 100000.00
Newyork 25000.00
California 25000.00
Texas 25000.00
Nevada 25000.00
Transaction dataset:
Emp_loc Emp_sal
Texas 30000.00
USA 105000.00
I want to update the master dataset, but need to keep the values of the EMP_Loc in the same order ie, after my change the dataset should look like this:
Updated Master dataset:
Emp_loc Emp_sal
USA 105000.00
Newyork 25000.00
California 25000.00
Texas 30000.00
Nevada 25000.00
Please share your ideas.
Thanks
Hash Table:
data master; input Emp_loc $ Emp_sal ; cards; USA 100000.00 Newyork 25000.00 California 25000.00 Texas 25000.00 Nevada 25000.00 ; run; data Transaction; input Emp_loc $ Emp_sal ; cards; Texas 30000.00 USA 105000.00 ; run; data want(drop=rc); if _n_ eq 1 then do; if 0 then set Transaction; declare hash ha(dataset:'Transaction'); ha.definekey('Emp_loc'); ha.definedata('Emp_sal'); ha.definedone(); end; set master; rc=ha.find(); run;
Ksharp
The hash code used by Ksharp above is more efficient.
However there is alternative approach as well by Merge statement, please try the below code
data dat1;
input Emp_loc $ Emp_sal;
order=_n_;
cards;
USA 100000.00
Newyork 25000.00
California 25000.00
Texas 25000.00
Nevada 25000.00
;
run;
data dat2;
input Emp_loc $ Emp_sal;
cards;
Texas 30000.00
USA 105000.00
;
run;
proc sort data = dat1;
by emp_loc;
run;
proc sort data = dat2;
by emp_loc;
run;
data all;
merge dat1(in=a) dat2(in=b);
by emp_loc;
run;
proc sort data = all;
by order;
run;
Hope this helps you.
Thanks,
Jagadish
From what I understand Hash tables are very useful in instances where you have a large table and a small lookup table. There is a point where hash tables become less effecient, so it is important that you undertake sufficient testing to ensure you are on the right track. Therefore I have assumed this is a sample and have provided the below alternative.
DATA MASTER;
INPUT EMP_LOC $ EMP_SAL ;
CARDS;
USA 100000.00
NEWYORK 25000.00
CALIFORNIA 25000.00
TEXAS 25000.00
NEVADA 25000.00
;
RUN;
DATA TRANSACTION;
INPUT EMP_LOC $ EMP_SAL ;
CARDS;
TEXAS 30000.00
USA 105000.00
;
RUN;
PROC SQL;
UPDATE MASTER
SET EMP_SAL = (SELECT EMP_SAL
FROM TRANSACTION
WHERE MASTER.EMP_LOC = TRANSACTION.EMP_LOC)
WHERE EMP_LOC IN (SELECT EMP_LOC
FROM TRANSACTION)
;
QUIT;
Regards,
Scott
Thank you all for sharing your thoughts!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.