Hi Experts,
I have two tables:
Table1 and Table2
Each table has 1000 columns (column1....column100) and each column has been defined as character with 5000 length
Columns in each table are 100% identical in type and length
95% of the columns in each table empty and no value
I would like to compare two tables to see if there is any difference in data.
For example: table1 has rows which not existed in table2 or table2 has rows which not existed in table1 such as:
Proc sql;
create table diff1 as
select * from table1
EXCEPT
select * from table2
;
quit;
Proc sql;
create table diff2 as
select * from table2
EXCEPT
select * from table1
;
quit;
Proc sql noprint;
select sum(Diff_Count) into :TotalChanges from
(
(select count(*) as Diff_Count from diff1)
union all
(select count(*) as Diff_Count from diff2)
) x
;
quit;
(I want &TotalChanges above)
last condition is table1 always has data but table2 sometimes empty
How to program this most efficiently due to large number of columns and length?
Efficient in run time or programming time?
What do you need as output?
I have modified my questions. The output wanted can be the difference rows between two tables.
What type of row comparison is required? Is it row-by-row (row 1 with row 1, row 2 with row 2) or is by row key(s) that uniquely identifies each row? PROC COMPARE is my usual starting point for table comparisons and it can do either type of row comparison. If table 2 is sometimes empty then I would test that condition first and not compare further as there no point trying to compare an empty table.
Here is a brief example of using Proc Compare with data source you should have available. First create set that is a little different with a few values just to see that they are reported. Then use the default output.
data work.class; set sashelp.class; If _n_ in (3, 12) then call missing(sex,age); run; proc compare base=sashelp.class compare=work.class ; run;
The first part of the report will report on the table construction such as if there are different numbers of records or variables or variables with different attributes.
Then there is a summary of how many records have unequal values and how many records have all the values the same and variable differences summary.
Details will follow of all of the variables with differences.
Options on the Proc statement control which output is created and can create data sets of different types of summaries. Read the documentation and play with a small example such as I provided to see the results of the different options.
One big warning: if the data sets have been sorted differently then you will get lots of mismatches as there is a record by record comparison. So you may need to sort the data sets to have the same order.
There are options to compare differently named variables or use specific variable combinations for matching.
You can even compare different variables in the same set if you should have identical (or different) values and want to confirm that is the case.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.