SAS Programming

GPatel · Posted 08-22-2017 10:19 PM

I am comparing two datasets, say A and B.

My SAS code is :

proc compare base=N_402612(Keep=receipt ) compare=N_389132(Keep=receipt ) out=check ;
run;

LOG:

23         GOPTIONS ACCESSIBLE;
24         	
24       !  proc compare base=N_402612(Keep=receipt ) compare=N_389132(Keep=receipt ) out=check  ;
25         
26         	run;

NOTE: There were 402612 observations read from the data set WORK.N_402612.
NOTE: There were 389132 observations read from the data set WORK.N_389132.
NOTE: The data set WORK.CHECK has 389132 observations and 3 variables.

Results:

                                                                                                                                    
                                                                                                                                    
Observation Summary                                                                                                                 
                                                                                                                                    
Observation      Base  Compare                                                                                                      
                                                                                                                                    
First Obs           1        1                                                                                                      
First Unequal       9        9                                                                                                      
Last  Unequal  389132   389132                                                                                                      
Last  Match    389132   389132                                                                                                      
Last  Obs      402612        .                                                                                                      
                                                                                                                                    
Number of Observations in Common: 389132.                                                                                           
Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.                                                            
Total Number of Observations Read from WORK.N_402612: 402612.                                                                       
Total Number of Observations Read from WORK.N_389132: 389132.                                                                       
                                                                                                                                    
Number of Observations with Some Compared Variables Unequal: 389124.                                                                
Number of Observations with All Compared Variables Equal: 8.                                                                        
                                                                                                                                    
                                                                                                                                    
Values Comparison Summary                                                                                                           
                                                                                                                                    
Number of Variables Compared with All Observations Equal: 0.                                                                        
Number of Variables Compared with Some Observations Unequal: 1.                                                                     
Total Number of Values which Compare Unequal: 389124.                                                                               
                                                                                                                                    
                                                                                                                                    
All Variables Compared have Unequal Values                                                                                          
                                                                                                                                    
Variable  Type  Len   Label          Ndif   MaxDif                                                                                  
                                                                                                                                    
Receipt   CHAR   13   Recipt Number389124

In the output dataset Check, I want to keep only n=13480 records in my output dataset check.

Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.

GPatel · Posted 08-25-2017 06:33 AM

Well, very good responses recieved from Reeza, ChrisNZ, Ron_Fehd_macro_maven, and Quentin.

Each of them shed very intereting insights and comments.

It was found that my dataset A has n=xxxx,xxxx number of UNIQUE records not observed in dataset B. Similarly, dataset B has n=xxx,xxx number of Unique records not observed in dataset A.

So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.

Thanks to all who have responded.

Girish Patel

View solution in original post

Reeza · Posted 08-23-2017 12:21 AM

An SQL query to do that is likely easier than trying to get the numbers out of PROC COMPARE. Otherwise look at the OUTDIFF data set.

Reeza · Posted 08-23-2017 12:22 AM

Also, note that value is the difference, it did a line by line comparison when I suspect you're trying to check for a specific variable. You need to add that option to proc compare, I think it's a VAR statement or ID statement.

GPatel · Posted 08-23-2017 06:52 AM

Thanks for your insight and suggested solution. Proc SQL and Data Merge doesn't give me correct/exact counts. I need to

create a SAS dataset of n=13,480 records as outlined by Proc Compare results.

Number of Observations in WORK.N_402612 but not in WORK.N_389132: 13480.

I hope somebody will shed some lights and offers desired solution.

Reeza · Posted 08-23-2017 10:43 AM

Your PROC COMPARE is wrong, so expecting that number in the first place is wrong.

ChrisNZ · Posted 08-23-2017 12:33 AM

You want to only keep 10,000 records? Which ones?

How does proc compare know?

As @Reeza said, you probably need a key (an ID statement in proc compare, or another join like a proc sql) to give some sense to your comparison.

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 08-23-2017 07:00 PM

Let's try another way.

What don't you just look at the first 13,480 records? or the the last 13,480 records?

Why not? How do *you* know these are not the ones you are after?

High-Performance SAS Coding - Third Edition

Ron_MacroMaven · Posted 08-23-2017 07:16 PM

these papers describes macros that may have hints to what you want to do.

http://www.sascommunity.org/wiki/Macro_ComparWS_Compare_with_summary

http://www.sascommunity.org/wiki/Macro_Extract

Ron Fehd comparatively, a maven

GPatel · Posted 08-23-2017 07:21 PM

I did try it. No luck.
Is there any way to isolate only n=13,480 records?

Reeza · Posted 08-23-2017 09:28 PM

Of course, but you need to define how, what are the rules that define what to be extracted. You haven't commented on the fact that the PROC COMPARE is wrong and that you don't know the difference. You don't know how many are in one and not the other, because you can have duplicates. If each data set has no duplicate records then you may be ok.

If you have no duplicates in either data set this approach may work:

proc sort data=have1; by ID_VAR;
proc sort data=have2; by ID_VAR;

data both /*includes records from both datasets*/
        in1 /*records only in have1*/
       in2 /*records only in have2*/;
merge have1 (in=a)
           have2 (in=b);

by ID_VAR;

if a and b then output both;
if a and not b then output in1;
if b and not a then output in2;

run;

Quentin · Posted 08-23-2017 09:54 PM

Interesting. There is a LISTBASEOBS options which will list the records in the base dataset only, but I don't see an OUTBASEOBS option that would output those records. So I'm going to go out on a limb and say no, PROC COMPARE can't do it.

But do note the point made above by many folks. Right now you are doing a one-to-one sequential comparison (first record of each dataset is compared, then second record of each, etc.) When you compare N_402612 to N_389132, the 13,480 records reported as being in N_402612 are simply the last 13,480 records from N_402612 .

You are doing the equivalent of:

proc compare base=sashelp.class compare=sashelp.class(obs=10) ;
run;

Which will report a difference of 9 records.

If you really want the last 13,480 records from N_402612 you can get them like:

data want;
  set N_402612(firstobs=%eval(402612-13479));
run;

The Boston Area SAS Users Group is hosting free webinars!
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.

ChrisNZ · Posted 08-23-2017 11:19 PM

Or maybe you just want:

proc sql;
select * from TAB1 
except 
select * from TAB2;
quit;

from the paucity of your explanations, it is hard to guess what you want.

High-Performance SAS Coding - Third Edition

GPatel · Posted 08-25-2017 06:33 AM

Well, very good responses recieved from Reeza, ChrisNZ, Ron_Fehd_macro_maven, and Quentin.

Each of them shed very intereting insights and comments.

It was found that my dataset A has n=xxxx,xxxx number of UNIQUE records not observed in dataset B. Similarly, dataset B has n=xxx,xxx number of Unique records not observed in dataset A.

So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.

Thanks to all who have responded.

Girish Patel

Reeza · Posted 08-25-2017 11:09 AM

@GPatel wrote:

So, I learned that Proc Compare should not be employed, but data step or SQL is a best solution. Case closed.

That's not correct. You can use PROC COMPARE, but you were not using it correctly. You misinterpreted the results because you didn't have a BY or ID or VAR/WITH statements.

SAS Programming

Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Re: Proc Compare

Comparing Attributes of Common Variables

Proc Compare

Proc compare

proc compare

Viya 2020.1: SID New ‘Compare Scenarios’ Feature

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...