Hi all:
I am trying to create a report by comparing 2 sas datasets with SAME variable name but different values due to different extraction dates
----Goal: to highlight differences : New, Deleted, Updated and find out which record is different.
---The challenge is :
1.There are lots of variables about 20-30 in each ds to be compared ;
2. Variables are mixed with numerical var. and character including dates.
3. Need to highlight different colors in excel file per the difference
4. used many arrays and %let variables to identify old, new and flags
---Steps I used but very prone to mistake:
*1.Rename old datasets variables for comparing with new datasets by suffix old_ for comparation purpose; EG. TEST TO OLD_TEST
*2.Convert all selected Numeric variables to Character variables by adding suffix c_ . So I may use one set of array for all same type of variables.
EG. DATE TO C_DATE
*3. Drop original numerical variables and keep all c_: .
Otherwise the values will not be able to overwrite. DROP DATE;
*4. Rename again variables with suffix c_: Back to old & new data original names for merge and compare in data step;
EG: RENAME C_DATE =DATE
---Question:
Does any one have any code sample which might use proc content level to work on this ?
thanks a lot,
Purpleclothlady
You are going to have a hard time dealing with two data sets with the same name. Unless dealing with generation sets you only have one SAS data set of a given name in a library.
If you mean to different data sets then sort them using identification variables other than those you know to change so the order should be identical.
Use Proc compare. It will compare every value of all the variables and report on differences. There are multiple choices for output data sets and contents. I am not quite sure what you want.
If one data set may have a different number of records us a BY statement with the By variables used for sorting.
I suggest the sort and run Proc Compare with the defaults to see the output. The basic syntax is extremely simple:
Proc Compare base=thisdatasetname compare=thatdatasetname; run;
If variables are of different types with the same name you won't get a value comparison. There is a header section that describes differences in the data sets, like/unlike named variables, number of observations, variables with same name but different type, different properties such as Label or Format for the same name variables.
"Different colors in Excel" is a report function and will depend on what your rules are.
When trying to build a new process I first search lexjansen.com to see what's already developed:
https://www.mwsug.org/proceedings/2011/appdev/MWSUG-2011-AD09.pdf
This one seems somewhat close to what you're trying to do?
PROC COMPARE is probably the easiest starting point.
PROC COMPARE is a lot easier than the approach you are suggesting.
However is there a unique key for the rows in the table? If not you are going to have a hard time lining up the rows to compare.
hi Dear all:
thank you so much,👍 maybe I didn't explain the goal clearly.
I had first checked SASpaper . this is the exactly what I need to accomplish, there are many places need to be fixed though. I will try to use this as blueprint.
https://www.pharmasug.org/proceedings/2017/AD/PharmaSUG-2017-AD04.pdf
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.