07-17-2017 12:59 PM
I'm merging two dataset, previous and current with a key in my by statement. I'd like to know which ones are new and which are changed from the previous dataset. I can get the new records from the IN= option, but how can I find which are changed?
data incurr(drop=date siteidp usubjidp catp testcdp resultp visitp visdatep visstp)
inprev(drop=key date siteidc usubjidc catc testcdc resultc visitc visdatec visstc)
merge prevdata(in=inprev) currdata(in=incurr);
if incurr and ^ inprev then output incurr;
else if inprev and ^ incurr then output inprev;
if inprev and incurr then output inboth;
07-17-2017 01:09 PM
It's extremely difficult to do this in the same DATA step that merges. But it's easier to do after the fact.
Extract a subset of PREVDATA, taking just those KEY values that appear in INBOTH. It looks like you know how to use IN= variables to subset, so that step should be relatively easy. (Just make sure that KEY is the only variable that you bring in from INBOTH when you subset.) Then run a PROC COMPARE, to compare that subset with INBOTH.
07-18-2017 10:15 AM
I first did the subset of prevdata with inboth and it gave me the same amount of obs that was in prevdata. Then I did the proc compare and it didn't show me which were changed from prevdata. What am I doing wrong?
merge inprev(in=prev) inboth(in=both);
sorts by key...
proc compare base=compare compare=inboth;
Value Comparison Results for Variables
|| Base Value Compare Value
Obs || key key
________ || ___________________+ ___________________+
1 || 01001001BASELINE(V3W 01001001BASELINE(V3W
2 || 01001001BASELINE(V3W 01001001BASELINE(V3W
3 || 01001001BASELINE(V3W 01001001BASELINE(V3W
4 || 01001001RUN-IN(V2W-3 01001001BASELINE(V3W
5 || 01001001RUN-IN(V2W-3 01001001BASELINE(V3W
6 || 01001001RUN-IN(V2W-3 01001001BASELINE(V3W
07-18-2017 12:47 PM
In merging, the values from INBOTH are always being used for all variables. They overwrite any values from INPREV. To overcome that, the MERGE statement should look like this:
merge inprev(in=prev) inboth(in=both keep=key);
07-17-2017 01:42 PM
*Create an indicator of origin of 'key'; data changes; merge prevdata(in=inprev) currdata(in=incurr); by key; length change_flag $ 24; if incurr and inprev then change_flag = "Both Files --- No Change"; else if inprev then change_flag = "Previous File Only"; else if incurr then change_flag = "Current File Only"; run; *Find frequencies of origins of 'key'; proc freq data = changes; tables change_flag; run;
07-17-2017 01:52 PM - edited 07-17-2017 01:55 PM
Are you trying to update your master table with new values from the transaction table. Then UPDATE statement maybe the best way and look my example code :
data one; infile datalines ; input val1 $ val2 $; valx=val2; datalines; A AA B BB C CC D DD E EE ; RUN; data two; infile datalines ; input val1 $ val2 $; datalines; A AA1 B BB C CC D DD1 E EE ; RUN;
proc sort data=one;
proc sort data=two;
by val1; data compare_one_two; update one two; by val1; IF val2 ^=valx then status="Updated"; run;
Hope this helps
07-18-2017 04:09 PM
The poster was asking about which key (merging) variables had changed. In your example, no key variables have changed, only the values associated with them.
07-18-2017 05:31 PM
I wrote a paper at the end of the previous century
with a macro to show the differences.
SUGI 23: %COMPARWS: Compare with Summary: A Macro Using Proc Compare