Hi team,
i am looking for any automated solution to comapare two large data which contain many variables , and print the results in excel by highliting the differences.
Thanks in advance.
Proc compare is used for comparing two datasets. This doesn't however give the output you want exactly so you will have to manipulate the output from that into what ever it is you want.
Another way is to do:
proc sql; create table out as select * from table1 except select * from table2 union all select * from table2 except select * from table1; quit;
Or something along those lines (this just shows rows which are different between the two, you would then need to write code to do highlighting and such like.
I would say proc compare output is the simplest to get and probably most detailed, but it isn't the output you want, which is very specific to you and hence probably no specific tool.
A final option, export to CSV and use a text compare tool maybe? Seems a bit of a faff just to get some highlighted output though.
Hi Team,
we have almost 20 tables which is a SCD TYPE2 with more than 250 columns in each .i need to find the difference of old and new records in each table and high lisght the differences in excel.
To break the requirement
1. i took a back of table 1 before updating the table with new file let say old_table with 2 rows
Old_table :
id name city processed_date
1 Scott sweden 26/07/2018
2 Scott America 04/07/2018
2. After running a job , my table will get updated with new values which is type 2 SCD stores in different rows . So i copy the updated .
New_table :
id name city processed_date
1 Scott Finland 18/09/2018
2 Scott America 04/07/2018
3. Now i rename the columns in old data such as (by using rename macro)
Old_table :
id name_0ld city_old processed_date_old
1 Scott sweden 26/07/2018
4. Now i need to get my output , by merging old and new data set with only mismath rows or updated data with highlighted differences
id name_old name city_old city processed_date_old processed_date
1 Scott Scott Sweden Finland 26/07/2018 18/09/2018
i can achieve this solution by data merge if i have limited columns , but i want to automate this for large data sets which have multiple columns.
Can some one help , with a generic code to compare and highlight the difference records as descibed above.
Thanks in advance
Have you looked at using Proc Compare?
I don't have a generic solution, but it's not that difficult to make one. If you can do it for one full data set and post that, you can then wrap it in a macro.
The reason I don' thave a generic solution is the requirements change slightly every time.
For example, I do have several generic macros that will report which variables are in which data sets which I more commonly need but for a dim change like this I would recommend a new macro.
Or you can parse PROC COMPARE. It really depends on what you want but there are several different OUT options that let you limit the output as desired.
@learn_SAS_23 wrote:
yes, i tried with proc compare The drawback of the procedure Proc Compare is its lengthy output if mismatches do exist. It is very
time consuming and labor extensive to find the information of interest, or pinpoint the exact changes in the dataset.
@learn_SAS_23 wrote:
Hi Team,
we have almost 20 tables which is a SCD TYPE2 with more than 250 columns in each .i need to find the difference of old and new records in each table and high lisght the differences in excel.
To break the requirement
1. i took a back of table 1 before updating the table with new file let say old_table with 2 rows
Old_table :
id name city processed_date
1 Scott sweden 26/07/2018
2 Scott America 04/07/2018
2. After running a job , my table will get updated with new values which is type 2 SCD stores in different rows . So i copy the updated .
New_table :
id name city processed_date
1 Scott Finland 18/09/2018
2 Scott America 04/07/2018
3. Now i rename the columns in old data such as (by using rename macro)
Old_table :
id name_0ld city_old processed_date_old
1 Scott sweden 26/07/2018
4. Now i need to get my output , by merging old and new data set with only mismath rows or updated data with highlighted differences
id name_old name city_old city processed_date_old processed_date
1 Scott Scott Sweden Finland 26/07/2018 18/09/2018
i can achieve this solution by data merge if i have limited columns , but i want to automate this for large data sets which have multiple columns.
Can some one help , with a generic code to compare and highlight the difference records as descibed above.
Thanks in advance
You say you have 20 data sets to compare. What do you expect this to look like after comparing all 20 of them?
It almost sounds like you would have 20 versions of each variable??
@learn_SAS_23 wrote:
in this code snippet i did for 3 columns, but i have to create a mismatch data set for 200 columns, can this can be done through macro , any suggestions please
Yes it can, but if you need this, PROC COMPARE really does a good job of doing that automatically. You have to do multiple checks though - one for if a record is present or not in the first place, one if it's different and what happens, if possible, if you have a key variable that is duplicated for some reason, which record would you check.
Otherwise to write a macro I'd start with:
1. Get list of variables from data set, minus KEY joining variables.
2. Make a RENAME statement so that all variables are renamed automatically (add old/new prefix to one of the data sets). You show this in your example but you need to automate it.
3. Merge using the KEY variable.
4. Set up two/four arrays and loop through checking for differences, two if you have all variables the same type, 4 if you have both numeric and characters that you need to compare.
5. Print report of differences
Make this a macro using the Key variables to join on and an output data set name.
data want;
set have;
array old_char (*) $ <list of character variables>;
array old_num(*) <list of numeric variables>;
array new_char(*) $ <list of new names>;
array new_num(*) <list of new names>;
array diff_char(*) $ diffC1-diffCn; *you may want to make new names here;
array diff_num(*) diffN1-diffNn;
*loop over character;
do i=1 to dim (old_char);
diff_char(i) = old_char(i) = new_char(i);
end;
*loop over numeric;
do i=1 to dim (old_num);
diff_num(i) = old_num(i) - new_num(i);
end;
run;
Sure, but please post back your macro so others can use it as well 😉
I ended up being wrong, you need 6 arrays, one for old/new/diff x 2 - one for character and numeric.
There should be an easier way.
Here's a PROC COMPARE example. You could transpose this output data set and get what you want I think and it's easier to wrap that in a macro program IMO.
data class_new;
set sashelp.class;
if age in (12, 13, 14) then height = height*2;
if age = 11 then delete;
run;
proc sort data=class_new;
by name; run;
proc sort data=sashelp.class out=class_old;
by name;run;
proc compare base=class_new compare=class_old outall out=want;
run;
@learn_SAS_23 wrote:
Thanks reeza,
Can you please send a sample code snippet for point number 4, using 4arrays
to verify data, am just learning sas
I made code for other points (1,2,3,5)
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: