- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi team,
i am looking for any automated solution to comapare two large data which contain many variables , and print the results in excel by highliting the differences.
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Proc compare is used for comparing two datasets. This doesn't however give the output you want exactly so you will have to manipulate the output from that into what ever it is you want.
Another way is to do:
proc sql; create table out as select * from table1 except select * from table2 union all select * from table2 except select * from table1; quit;
Or something along those lines (this just shows rows which are different between the two, you would then need to write code to do highlighting and such like.
I would say proc compare output is the simplest to get and probably most detailed, but it isn't the output you want, which is very specific to you and hence probably no specific tool.
A final option, export to CSV and use a text compare tool maybe? Seems a bit of a faff just to get some highlighted output though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Team,
we have almost 20 tables which is a SCD TYPE2 with more than 250 columns in each .i need to find the difference of old and new records in each table and high lisght the differences in excel.
To break the requirement
1. i took a back of table 1 before updating the table with new file let say old_table with 2 rows
Old_table :
id name city processed_date
1 Scott sweden 26/07/2018
2 Scott America 04/07/2018
2. After running a job , my table will get updated with new values which is type 2 SCD stores in different rows . So i copy the updated .
New_table :
id name city processed_date
1 Scott Finland 18/09/2018
2 Scott America 04/07/2018
3. Now i rename the columns in old data such as (by using rename macro)
Old_table :
id name_0ld city_old processed_date_old
1 Scott sweden 26/07/2018
4. Now i need to get my output , by merging old and new data set with only mismath rows or updated data with highlighted differences
id name_old name city_old city processed_date_old processed_date
1 Scott Scott Sweden Finland 26/07/2018 18/09/2018
i can achieve this solution by data merge if i have limited columns , but i want to automate this for large data sets which have multiple columns.
Can some one help , with a generic code to compare and highlight the difference records as descibed above.
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have you looked at using Proc Compare?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
time consuming and labor extensive to find the information of interest, or pinpoint the exact changes in the dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't have a generic solution, but it's not that difficult to make one. If you can do it for one full data set and post that, you can then wrap it in a macro.
The reason I don' thave a generic solution is the requirements change slightly every time.
For example, I do have several generic macros that will report which variables are in which data sets which I more commonly need but for a dim change like this I would recommend a new macro.
Or you can parse PROC COMPARE. It really depends on what you want but there are several different OUT options that let you limit the output as desired.
@learn_SAS_23 wrote:
yes, i tried with proc compare The drawback of the procedure Proc Compare is its lengthy output if mismatches do exist. It is very
time consuming and labor extensive to find the information of interest, or pinpoint the exact changes in the dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@learn_SAS_23 wrote:
Hi Team,
we have almost 20 tables which is a SCD TYPE2 with more than 250 columns in each .i need to find the difference of old and new records in each table and high lisght the differences in excel.
To break the requirement
1. i took a back of table 1 before updating the table with new file let say old_table with 2 rows
Old_table :
id name city processed_date
1 Scott sweden 26/07/2018
2 Scott America 04/07/2018
2. After running a job , my table will get updated with new values which is type 2 SCD stores in different rows . So i copy the updated .
New_table :
id name city processed_date
1 Scott Finland 18/09/2018
2 Scott America 04/07/2018
3. Now i rename the columns in old data such as (by using rename macro)
Old_table :
id name_0ld city_old processed_date_old
1 Scott sweden 26/07/2018
4. Now i need to get my output , by merging old and new data set with only mismath rows or updated data with highlighted differences
id name_old name city_old city processed_date_old processed_date
1 Scott Scott Sweden Finland 26/07/2018 18/09/2018
i can achieve this solution by data merge if i have limited columns , but i want to automate this for large data sets which have multiple columns.
Can some one help , with a generic code to compare and highlight the difference records as descibed above.
Thanks in advance
You say you have 20 data sets to compare. What do you expect this to look like after comparing all 20 of them?
It almost sounds like you would have 20 versions of each variable??
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
like i explained in my example
id name_old name city_old city processed_date_old processed_date
1 Scott Scott Sweden Finland 26/07/2018 18/09/2018
if i can use merge
data Merged;
merge Table_old_1 (in = x) Table_new_1(in = y);
by id;
if x = 1 and y = 1;
run;
proc sql;
create table MISMATCH as
select a.id,a.name_old,a.name_new,
a.city_old , b.city_new,
a.processed_date_old,b.processed_date_new,
from Merged a, Merged b
where a.id=b.id
and ( (a.name_old ne b.name_new) or
(a.city_old ne b.city_new) or
(a.processed_date_old ne b.processed_date_new) )
;quit;
in this code snippet i did for 3 columns, but i have to create a mismatch data set for 200 columns, can this can be done through macro , any suggestions please
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@learn_SAS_23 wrote:
in this code snippet i did for 3 columns, but i have to create a mismatch data set for 200 columns, can this can be done through macro , any suggestions please
Yes it can, but if you need this, PROC COMPARE really does a good job of doing that automatically. You have to do multiple checks though - one for if a record is present or not in the first place, one if it's different and what happens, if possible, if you have a key variable that is duplicated for some reason, which record would you check.
Otherwise to write a macro I'd start with:
1. Get list of variables from data set, minus KEY joining variables.
2. Make a RENAME statement so that all variables are renamed automatically (add old/new prefix to one of the data sets). You show this in your example but you need to automate it.
3. Merge using the KEY variable.
4. Set up two/four arrays and loop through checking for differences, two if you have all variables the same type, 4 if you have both numeric and characters that you need to compare.
5. Print report of differences
Make this a macro using the Key variables to join on and an output data set name.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you please send a sample code snippet for point number 4, using 4arrays
to verify data, am just learning sas
I made code for other points (1,2,3,5)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set have;
array old_char (*) $ <list of character variables>;
array old_num(*) <list of numeric variables>;
array new_char(*) $ <list of new names>;
array new_num(*) <list of new names>;
array diff_char(*) $ diffC1-diffCn; *you may want to make new names here;
array diff_num(*) diffN1-diffNn;
*loop over character;
do i=1 to dim (old_char);
diff_char(i) = old_char(i) = new_char(i);
end;
*loop over numeric;
do i=1 to dim (old_num);
diff_num(i) = old_num(i) - new_num(i);
end;
run;
Sure, but please post back your macro so others can use it as well 😉
I ended up being wrong, you need 6 arrays, one for old/new/diff x 2 - one for character and numeric.
There should be an easier way.
Here's a PROC COMPARE example. You could transpose this output data set and get what you want I think and it's easier to wrap that in a macro program IMO.
data class_new;
set sashelp.class;
if age in (12, 13, 14) then height = height*2;
if age = 11 then delete;
run;
proc sort data=class_new;
by name; run;
proc sort data=sashelp.class out=class_old;
by name;run;
proc compare base=class_new compare=class_old outall out=want;
run;
@learn_SAS_23 wrote:
Thanks reeza,
Can you please send a sample code snippet for point number 4, using 4arrays
to verify data, am just learning sas
I made code for other points (1,2,3,5)