BookmarkSubscribeRSS Feed
gyambqt
Obsidian | Level 7

Hi Experts,

I have two tables: 

Table1 and Table2

 

Each table has 1000 columns (column1....column100) and each column has been defined as character with 5000 length

Columns in each table are 100% identical in type and length

 

95% of the columns in each table empty and no value

 

I would like to compare two tables to see if there is any difference in data.

For example: table1 has rows which not existed in table2 or table2 has rows which not existed in table1 such as:

 

Proc sql;
create table diff1 as
select * from table1
EXCEPT
select * from table2
;
quit;

Proc sql;
create table diff2 as
select * from table2
EXCEPT
select * from table1
;
quit;

 

Proc sql noprint;
select sum(Diff_Count) into :TotalChanges from
(
(select count(*) as Diff_Count from diff1)
union all
(select count(*) as Diff_Count from diff2)
) x
;
quit;

(I want &TotalChanges above)

last condition is table1 always has data but table2 sometimes empty

 

How to program this most efficiently due to large number of columns and length?

 

4 REPLIES 4
Reeza
Super User

Efficient in run time or programming time?

What do you need as output?

 

 

gyambqt
Obsidian | Level 7

I have modified my questions. The output wanted can be the difference rows between two tables.

SASKiwi
PROC Star

What type of row comparison is required? Is it row-by-row (row 1 with row 1, row 2 with row 2) or is by row key(s) that uniquely identifies each row? PROC COMPARE is my usual starting point for table comparisons and it can do either type of row comparison. If table 2 is sometimes empty then I would test that condition first and not compare further as there no point trying to compare an empty table.

ballardw
Super User

Here is a brief example of using Proc Compare with data source you should have available. First create set that is a little different with a few values just to see that they are reported. Then use the default output.

 

data work.class;
   set sashelp.class;
   If _n_ in (3, 12) then call missing(sex,age);
run;

proc compare base=sashelp.class
             compare=work.class
;
run;

The first part of the report will report on the table construction such as if there are different numbers of records or variables or variables with different attributes.

 

Then there is a summary of how many records have unequal values and how many records have all the values the same and variable differences summary.

 

Details will follow of all of the variables with differences.

Options on the Proc statement control which output is created and can create data sets of different types of summaries. Read the documentation and play with a small example such as I provided to see the results of the different options.

 

One big warning: if the data sets have been sorted differently then you will get lots of mismatches as there is a record by record comparison. So you may need to sort the data sets to have the same order.

 

There are options to compare differently named variables or use specific variable combinations for matching.

 

You can even compare different variables in the same set if you should have identical (or different) values and want to confirm that is the case.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 491 views
  • 2 likes
  • 4 in conversation