BookmarkSubscribeRSS Feed
Reeza
Super User
Its two steps. First remove any actual duplicates using EXCEPT.
Then use MODIFY on the modified transaction data set which now only has the changed records. Then use PROC COMPARE to find the exact differences if you had wanted to.
elwayfan446
Barite | Level 11

@Reeza, will the new transactional dataset also include the new records not in the master?

 

Reeza
Super User
Yes, that's the exact purpose of the EXCEPT step. It only removes the exact matches. You will need to test it, for example you may need to exclude the dates from the query since it will have an updated date or something.
elwayfan446
Barite | Level 11

Yes, there will be fields that could change in the transactional dataset each day that will need to be updated in the master.  If I understand this right, I shouldn't have to write any criteria.  Once the exact matches are removed, I am left with everything else... new records and changed records.  Once I modify, all of those particular records will be overwritten with any new data for any field that is updated.

 

I will look up more specifics on proc compare since I have not used it but does it compare on a dataset level or do you have to specify which variables to look at?

Reeza
Super User

@elwayfan446 wrote:

Yes, there will be fields that could change in the transactional dataset each day that will need to be updated in the master.  If I understand this right, I shouldn't have to write any criteria.  Once the exact matches are removed, I am left with everything else... new records and changed records.  Once I modify, all of those particular records will be overwritten with any new data for any field that is updated.

 

I will look up more specifics on proc compare since I have not used it but does it compare on a dataset level or do you have to specify which variables to look at?


It has the option of either. Note that you will need to save a version of your table before the update so you can compare it. 

elwayfan446
Barite | Level 11

Gotcha.

 

Cool, I will give this a try and let you know how it works.  I appreciate all the advice.

SASKiwi
PROC Star

@elwayfan446 - The example SQL code I posted above is the basis of how I do updating a master dataset from a transaction dataset. Once you get a dataset containing a list of the rows that have been added or changed, identified by a row key, then you can apply that in the second MODIFY step. 

elwayfan446
Barite | Level 11

@Reezaand @Tom, could one of you give me a generic example on the proper way to use the modify between the new dataset with only the records to be updated and master dataset?  Then, would I use the proc compare on it compared to the updated master dataset?

elwayfan446
Barite | Level 11
Actually, I have figured out what you were trying to tell me. Playing around with it a bit more and I will report back. Thanks!
elwayfan446
Barite | Level 11

@Reeza... with proc compare, how would I see which records have been added to the master dataset?  I can only see the base table, compare table, and a difference record when I do this.  Does the modify statement based on the new transactional dataset add new observations to the table that don't currently exist?

Tom
Super User Tom
Super User
PROC COMPARE does not do a good job of showing inserted or deleted observations.
I have 20+ year old macro that I use to run PROC COMPARE that first generates and lists the ADDED and DELETED records and then runs PROC COMPARE on just the overlapping observations.
Reeza
Super User

@elwayfan446 wrote:

  Does the modify statement based on the new transactional dataset add new observations to the table that don't currently exist?


Yes it does, you should test that it gives you what you need with small samples though. 

 


@elwayfan446 wrote:

how would I see which records have been added to the master dataset?  


When I run PROC COMPARE with an ID statement that shows the key variables I get this type of output. Not super helpful, but I assume you can also get some of that with the dates you've included as well. 

 

                                                                                                                            
                                                Comparison Results for Observations                                                 
                                                                                                                                    
                                  Observation 1 in WORK.CLASS1 not found in WORK.CLASS2: Name=Alfred.                               
                                                                                                                                    
                                  Observation 7 in WORK.CLASS1 not found in WORK.CLASS2: Name=Jane.                                 
                                                                                                                                    
                                  Observation 7 in WORK.CLASS2 not found in WORK.CLASS1: Name=Jaquie.                               
                                                                                                                                    
                                  Observation 17 in WORK.CLASS2 not found in WORK.CLASS1: Name=Sensei.    

  

Reeza
Super User

Here's the code I was using to test:

 

/*create fake data set*/
data class2;
    set sashelp.class end=eof;
    *drop two records;

    if name in ("Alfred", "Jane") then
        delete;
    output;
    *add two records;

    if eof then
        do;
            name='Jaquie';
            age=14;
            sex='F';
            weight=85;
            height=80;
            output;
            name='Sensei';
            age=14;
            sex='M';
            weight=90;
            height=80;
            output;
        end;
run;

*fake data set 2;

data class1;
    set sashelp.class;
run;

*sort both by name;

proc sort data=class1;
    by name;

proc sort data=class2;
    by name;
    *compare results;

proc compare data=class1 compare=class2 listall;
    id name;
run;
Reeza
Super User
Sorry for so many response, but in general, I agree with Tom, PROC COMPARE isn't as useful as it could be and a lot of people have their custom macros to show the differences.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 28 replies
  • 2013 views
  • 4 likes
  • 5 in conversation