SAS Procedures

helloSAS · Posted 08-15-2013 02:50 PM

how can I use proc compare for datasets that have different number of records?

Example: table A has 5000 records; table B has 5025 records.I want to spit out the 25 records that do not exist in table A and also any other differences.

Thanks

LarryWorley · Posted 08-15-2013 07:19 PM

So does your output look something like the following. I created a 10 record dataset and made a copy of it. In the copy I deleted the fifth record and changed the values of the non-id variable on the last five variables.

                                                                 The COMPARE Procedure
                                                                Comparison of WORK.X with WORK.Y
                                                                         (Method=EXACT)

Data Set Summary

Dataset Created Modified NVar NObs

WORK.X 15AUG13:17:47:52 15AUG13:17:47:52 2 10
WORK.Y 15AUG13:18:10:07 15AUG13:18:10:07 2 9

Variables Summary

Number of Variables in Common: 2.
Number of ID Variables: 1.

Observation Summary

Observation Base Compare ID

                                                            First Obs           1        1 i=1
                                                            First Unequal       6        5 i=6
                                                            Last Unequal      10        9 i=10
                                                            Last Obs          10        9 i=10

                                                Number of Observations in Common: 9.
                                                Number of Observations in WORK.X but not in WORK.Y: 1.
                                                Total Number of Observations Read from WORK.X: 10.
                                                Total Number of Observations Read from WORK.Y: 9.

Number of Observations with Some Compared Variables Unequal: 5.
Number of Observations with All Compared Variables Equal: 4.

Values Comparison Summary

                                                Number of Variables Compared with All Observations Equal: 0.
                                                Number of Variables Compared with Some Observations Unequal: 1.
                                                Total Number of Values which Compare Unequal: 5.
                                                Maximum Difference: 10.

Comparison of WORK.X with WORK.Y
(Method=EXACT)

All Variables Compared have Unequal Values

Variable Type Len Ndif MaxDif

j NUM 8 5 10.000

Value Comparison Results for Variables

                                                   _________________________________________________________
                                                             ||       Base    Compare
                                                          i ||          j          j      Diff.     % Diff
                                                    _______ || _________ _________ _________ _________
                                                             ||
                                                          6 ||   120.0000   130.0000    10.0000     8.3333
                                                          7 ||   140.0000   150.0000    10.0000     7.1429
                                                          8 ||   160.0000   170.0000    10.0000     6.2500
                                                          9 ||   180.0000   190.0000    10.0000     5.5556
                                                         10 ||   200.0000   210.0000    10.0000     5.0000
                                                   _________________________________________________________

Take a look at the output dataset options from proc compare and see if there is an output datset which you could post process to make a more compact report. The raw output from proc compare can be a little overwhelming when there are many data discrepencies.

View solution in original post

SASKiwi · Posted 08-15-2013 03:47 PM

Proc Compare works in two ways:

1) Compares record for record between two tables (1 with 1, 2 with 2 etc)

2) Compares records using key variables common in both tables that uniquely identify which record in the first table is to be compared with which record in the second table.

So do your tables A and B contain one of more key variables that will uniquely identify a record?

If so use proc compare with an ID statement using these variables.

LarryWorley · Posted 08-15-2013 03:49 PM

I like to use ID variables for this. If you have a set of columns which uniquely identifies each record in both data sets, then ID statement lets proc compare handle the missing records appropriately. So for example with two datasets with unique personids on each record and where the second data set contains 25 records not found on the first, this will workL

proc compare data=one

compare=two

;

id personid ;

run .

Note this assumes both data sets are sorted by personid.

Another option is to use a where clause on the one with more records, but you still have to worry about sort order.

proc compare data=one

compare=two (where = ( not new record condition))

;

run .

HTH

Larry

helloSAS · Posted 08-15-2013 04:19 PM

I tried with the Id statement like the below code. But it does not work.

proc sort data=prep7;by acc_id;run;

proc sort data=prep7o;by acc_id;run;

proc compare base=prep7 compare=prep7o; id acc_id; run;

The problem is, when i sort with acc_id at record 2400 there is a missing record in table A. And sas shows all reords to be different after that record.

LarryWorley · Posted 08-15-2013 05:34 PM

Are you sure your sorts worked? You can check your log to determine that. It does not make sense to me that if you sorted the table A by ACC_ID that record number 2400 should have a missing value, unless the 2399 record before it had missing values. I reach this conclusion because (at least on ASCII machines), missing values are always the smallest possible value, regardless of whether it is a string or numeric variable. I am real rusty on EBCDIC so if you are on a Z/OS machine I could be wrong.

If the sorts did work and you have 2399 missing values in each file, then the compare is pointing to a difference in your files.

Can you post some of the output from proc compare?

Anotherdream · Posted 08-15-2013 06:40 PM

IK don't believe he's saying his sort didn't work, what he's saying is the proc compare is throwing an error for every reord after number 2,400.

That's because BOTH datasets have the first 2,400 records in common, however his dataset #2 is missing record 2,401 that exists in Dataset number 1.

Therefore, every record after that point is now incorrect because SAS is trying to compare record 2,401 in dataset A to record 2,401 in dataset B, however datatset B's record 2,401 is NOT equal to the same one in dataset A.

Correct me if I'm wrong, but i'm pretty sure this is what you mean because i've had it happen to me before.

However, by using the ID variable on your Key, this should not happen. Proc Compare is supposed to get an output like this...


Number of Observations in Common	9
Number of Observations in WORK.person but not in WORK.person1	1
Total Number of Observations Read from WORK.person1	9
Total Number of Observations Read from WORK.person	10
Number of Observations with Some Compared Variables Unequal	0
Number of Observations with All Compared Variables Equal:	9

helloSAS · Posted 08-15-2013 07:04 PM

@Anotherdream: you are exactly right in understanding my issue.

But, I did not get the proc compare report like you showed it. Unless i misread it. I'll confirm it again. I do remember getting a list of all the records that are unequal and my output was so huge that it printed max 50 records.

I was only expecting the records that were not existing in my other table. Is there a way I can do this, since i need it as my validation report.

LarryWorley · Posted 08-15-2013 07:19 PM

So does your output look something like the following. I created a 10 record dataset and made a copy of it. In the copy I deleted the fifth record and changed the values of the non-id variable on the last five variables.

                                                                 The COMPARE Procedure
                                                                Comparison of WORK.X with WORK.Y
                                                                         (Method=EXACT)

Data Set Summary

Dataset Created Modified NVar NObs

WORK.X 15AUG13:17:47:52 15AUG13:17:47:52 2 10
WORK.Y 15AUG13:18:10:07 15AUG13:18:10:07 2 9

Variables Summary

Number of Variables in Common: 2.
Number of ID Variables: 1.

Observation Summary

Observation Base Compare ID

                                                            First Obs           1        1 i=1
                                                            First Unequal       6        5 i=6
                                                            Last Unequal      10        9 i=10
                                                            Last Obs          10        9 i=10

                                                Number of Observations in Common: 9.
                                                Number of Observations in WORK.X but not in WORK.Y: 1.
                                                Total Number of Observations Read from WORK.X: 10.
                                                Total Number of Observations Read from WORK.Y: 9.

Number of Observations with Some Compared Variables Unequal: 5.
Number of Observations with All Compared Variables Equal: 4.

Values Comparison Summary

                                                Number of Variables Compared with All Observations Equal: 0.
                                                Number of Variables Compared with Some Observations Unequal: 1.
                                                Total Number of Values which Compare Unequal: 5.
                                                Maximum Difference: 10.

Comparison of WORK.X with WORK.Y
(Method=EXACT)

All Variables Compared have Unequal Values

Variable Type Len Ndif MaxDif

j NUM 8 5 10.000

Value Comparison Results for Variables

                                                   _________________________________________________________
                                                             ||       Base    Compare
                                                          i ||          j          j      Diff.     % Diff
                                                    _______ || _________ _________ _________ _________
                                                             ||
                                                          6 ||   120.0000   130.0000    10.0000     8.3333
                                                          7 ||   140.0000   150.0000    10.0000     7.1429
                                                          8 ||   160.0000   170.0000    10.0000     6.2500
                                                          9 ||   180.0000   190.0000    10.0000     5.5556
                                                         10 ||   200.0000   210.0000    10.0000     5.0000
                                                   _________________________________________________________

Take a look at the output dataset options from proc compare and see if there is an output datset which you could post process to make a more compact report. The raw output from proc compare can be a little overwhelming when there are many data discrepencies.

helloSAS · Posted 08-16-2013 10:01 AM

Thank you all for your input. It worked now.

My ID variable had duplicate records. Hence I was not getting desired compare output. I added another variable that makes the ID unique, that worked.

SAS Procedures

compare

Re: compare

Re: compare

Re: compare

Re: compare

Re: compare

Re: compare

Re: compare

Re: compare

Re: compare

Comparing Alerting Capabilities of Alertmanager with Grafana

A meta analysis for sensitivity comparing two diagnostic tools

Compare rows in a dataset

Comparing Prime Computations Using the New CAS Gateway Action Set

How do i make exception when comparing dates and flag

Follow Us

What is...

SAS Procedures

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...